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The commonest and most profitable method of instruction in the 
social sciences and in other “‘content’’ studies is undoubtedly through 
the printed page. The improvement of teaching in these studies 
depends, more than on any one other factor, on the improvement of 
reading materials. 

Three phases on the attitudes of educators to this problem may be 
noted. In the first, the emphasis is almost entirely on content. 
The era of the “textbook”’ writer, primarily interested in the scholarly 
aspects of his work, and grossly ignorant or unmindful of the nature 
and capacities of the learner, is unfortunately not past, but needs 
little comment. 

A second tend which has developed in recent years can be styled 
the study of ‘‘difficulty.”” Studies of word difficulties and of the 
vocabularies of children, such as those of Horn, Pressey and others, 
made it possible for curriculum makers to know in a fairly reliable 
degree what words could be understood by their prospective audiences. 
Further experiments in the psychology of reading gave notions of 
the nature of the process of understanding paragraphs and sentences. 
Some of the difficulties found in history textbooks have been enumer- 
ated as ‘‘literary embellishments, abstract thought, abstract words, 
technical language and long involved sentences.’’* This type of 
research has greatly improved the reading materials in the fields in 





1 This is one of a series of learning studies organized in the Lincoln School 
Social Science Laboratory by Harold Rugg, Director. 
2 Ayer, Adelaide M.: Some Difficulties in Elementary School History. Teach- 
ers College Contributions to Education, No. 212, 1926. 
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which it has been applied, but in a negative manner. The core has 
remained the same, though bettered by the weeding out of unsuitable 
elements. 

A third attitude, now being expressed by some students of the 
curriculum, would discard entirely the fundamental stuff of the old 
textbook material. They insist that ‘“‘the materials of instruction, 
now thoroughly denatured of imaginative content must become 
dramatic, vivid, compelling . . . their volume must be greatly 
expanded . . . The dramatic episode will become one of the chief 
vehicles from which understanding shall emerge. The bare facts 
of the curriculum will be woven into vital accounts of the interplay 
of human beings upon each other. Concepts . . . cues to under- 
standing, acquire rich meanings only through the study of cases, 
episodes, concrete situations.”' The reading materials devised by 
this group? are marked in the treatment of social and historical 
themes by dramatic episodes, together with vivid passages of gen- 
eralization which sum up and integrate the topics. 

This experiment was framed in an attempt to explore and evaluate 
certain aspects of the newer “episode”’ and “generalization” types of 
reading material in comparison with the “textbook narrative”’ type. 
These were compared not only individually but also in various 
sequences and arrangements. The questions that the experimenter 
set out to answer were: From the reading of which of these types does 
the pupil secure the greatest immediate knowledge of a social science 
concept? Does the successive reading of different units bring about a 
different level of achievement from the repeated reading of one unit? 
What is the best order for combining the “episode” and “general- 
ization,’ and how does the individual effectiveness of these units 
depend on their positions in the series? 


THe EXPERIMENT 


The general procedure of the experiment was this: Eight one- 
page units of reading material were written, in which the word- 
difficulty and other mechanical factors were made as nearly equal 
as possible. Four were of the textbook narrative style, two of the 





1 Rugg, Harold: A Preface to the Reconstruction of the American School 
Curriculum. Teachers College Record, March, 1926. 
2 The Social Science Pamphlets. 
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episodic narrative and two of the generalization style. These were 
variously combined in five sequences of four ‘‘stories’’ each. One 
sequence was read by the pupils of each of the five experimental 
groups. The measurement of the result was accomplished by five 
tests, one given before any reading and one following each page read, 
making it possible to plot a learning curve for each of the four-page 
sequences. 

The Reading Material.—The four stories of the textbook narrative 
type were designated N-1, N-2, N-3, and N+4. Each was a brief, 
simple, chronological account of the facts of the concept selected . 
“The Industrial Revolution.”’ A portion of one of them, N-1, follows: 


Tue INpustTRIAL RevouuTIoNn, N-1 


In England, in the middle of the 1700’s a great change took place, which 
introduced new and quicker ways of making things. This change is called ‘‘The 
Industrial Revolution.” 

Before the Industrial Revolution, manufacturing had always been done in 
the homes of the workers. In the making of cloth, for example, the thread was 
first spun from the wool or cotton on a simple machine, operated by hand power, 
known as the spinning wheel. Then the thread was woven into cloth, but this, 
too, was done on a simple, hand-operated machine. 

Several great inventions changed all this. Cartwright invented the power- 
loom, which was able to weave cloth by machinery. James Watt invented the 
steam engine, which supplied power to run the power-looms. This led to the 
building of mills and factories. The new machines were too large and too heavy 
for the home workers to buy. Only rich men could start factories. They bought 
the machines and hired other people to work for them. The goods produced by 
the factoriers were better and cheaper than those produced in the homes. 


The other three ‘‘narratives’’ were of the same type and differed 
very little from N-1 in content, being largely verbal rearrangements. 

For the “episode” it was found impossible to employ exactly 
the type of reading material that the makers of the new curriculum 
mean by the dramatic episode. Such material from the actual lives 
or writings of contemporaries of the event being presented, could 
not be compressed into the one page demanded by the experimental 
procedure. Two synthetic “episode” stories were therefore written. 
These are episodic in form, but perhaps lack ‘the essential spirit of 
reality. This shortcoming must be considered in the interpretation 
of the experiment. The stories are designated, therefor, as episodic 
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narratives, E-1 and E-2. A portion of E-1 is given below. E-2 was 
similar in form, but concerned the shoe industry. 


Tue InpusTRIAL ReEvowvutTion, E-1 


Would you like to turn back the calender a hundred and fifty years and see 
how goods were manufactured then? Suppose we look into the home of a worker 
of 1770. 

It was a busy work day in the home of John Carver, who was a weaver, or 
maker of cloth. On one side of the great kitchen his wife, Jenny Carver, sat 
spinning thread. It was a slow process. Around and around went the spinning- 
wheel, twisting the loose cotton into a fine strong yarn. But only one thread 
was made at atime. It took all day to spin four pounds of cotton into thread 
ready tobe woven. . . 

That is how goods were manufactured in 1770. But in the forty years fol- 

lowing came a great change! By 1810 the home industries had almost passed 
’ out of existence and the factory had become the center of manufacture. This 
great change is called The Industrial Revolution. 

Here is a story of the daily life of John Carver’s son, in 1810. See how differ- 
ent his work was from that of his father. 

Alfred Carver, his wife, and young Thomas, his twelve year old son, were up 
early in the morning. They had to get up early every morning, for their work at 
the factory began at seven o’clock. Their rooms were small, not at all like the 
great kitchen of Alfred’s father. They had moved into the city of Leeds, where 
the great cloth mills were. 

It was dark and smoky outside as they made their way to the mill. Alfred 
Carver went to the room where the humming, clanking looms were. Like his 
father before him, he was a weaver, but in what a different way! In the vast 
mill there were two hundred looms, turned by huge steam engines. Carver 
looked after twenty of them .. . 


The remaining two stories were of a condensed style of summary 
or generalization, designated as G-1 and G-2. A portion of G-1 
follows: 


Tue INDusTRIAL REVOLUTION, G-1 


About two hundred years ago in England a great change was taking place. 
This change, which introduced new and quicker ways of manufacture, is known as 
“The Industrial Revolution.” 

Up to this time practically all goods such as cloth and shoes were made in 
the home. The workman’s home was his shop, and he slowly made his product, 
doing each step in the manufacture himself. For making his goods he used tools 
and simple machines. In making cloth, the cotton was first spun into thread 
on a spinning wheel, and then woven on a hand loom. These simple machines 
were operated by the workman himself. Almost everything was done by hand. 
Since the process of hand manufacture was slow, not much goods was produced. 
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THEN CAME THE INDUSTRIAL REVOLUTION 


The Industrial Revolution was a great change from the system by 
which goods were produced on a small scale by the use of tools and 
simple machines in the homes of the workers to the system by which 
goods are produced on a very large scale with complicated machines 
in the factories and mills and shops of the modern world. 

The Industrial Revolution was largely caused by the invention of power 
machinery. In the last part of the 1700’s men invented machines to do the 
work that had been done by hand. The steam engine was invented then and 
was used to run the machines, 


By the use of the Teachers’ Word Book of E. L. Thorndike, the 
word difficulties of each of the stories were determined. In the 
Word Book the most common 10,000 words in the language, deter- 
mined by an extensive count, are listed, and with each word a “credit 
number” denoting the wideness and frequency of its occurrence. 
In Table I are given the data concerning the word and sentence 
difficulties of the stories. The eight units are seen not to differ in 
word difficulty to any appreciable extent or in any consistent direc- 
tion among the three types. Children of Grade VI may be expected 
to have vocabularies well in excess of 5000 words, as shown by several 
studies, and few words in any story were not in the most common 
5000. The episodes are somewhat longer, but the time allowed for 
reading was found sufficient for almost all children to finish. The 
length of sentences show small differences. 


TaBLE I1.—WorpD AND SENTENCE DIFFICULTIES OF READING MATERIALS 








Story................++++++++| Nel | N-2 | N-3 | N-4] E-1 | E-2 | G-1 | G-2 
Number of words............. 334 | 263 | 326 | 287 | 492 | 435 | 359 | 318 
Per cent of words not in first 

Es ee ton eee 47 | 47) 48; 46| 39; 48| 43) 46 
Average credit number of words 

not in first 100............. 64) 55| 66; 72 71 70 | 67 74 
Number of different words not 

EI, ccecccceweebas 3 3 5 2 5 4 2 4 
Average length of sentences..... 16 | 18; 18; 15| 18); 14) 14] 12 





























The Sequence Arrangements.—The sequences in which these units 
of reading material were arranged reflect the purpose of the experi- 
ment. The story sequences were as follows: 
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Group I. Repeated Narrative-—This arrangement consisted of 


one story, N-1, occurring in all of the four positions. Each pupil in 
this group read N-1 four times, taking a test after each reading. This 
corresponds somewhat with the school practice of assigning a short 
unit to be read a number of times until ‘learned.’ 

Group II. Varied Narrative-—Here the sequence was N-1, N-2, 
N-3, N-4. These stories were all of the narrative type, and differed 
only in wording and arrangement, the content being held as constant 
as possible. 

Group III. Episode and Generalization: Deductive-—This con- 
sisted of the new type stories in the order G-1, G-2, E-1, E-2. This 
roughly corresponds to logical deductive sequence, with the general- 
ization or summary preceding the episodic narratives. 

Group IV. Episode and Generalization: Inductive-—The sequence 
E-1, E-2, G-1, G-2, was used, the episodes preceding the generalization. 

Group V. Episode and Generalization: Alternating.—In this group 
the stories were arranged in the order E-1, G-1, E-2, G-2. 

Summarizing in a tabular form, the five experimental groups read 
the following sequences: 








Group I Group II Group III Group IV Group V 
N-1 N-1 G-1 E-1 E-1 
N-1 N-2 G-2 E-2 G-1 
N-l N-3 E-1 G-1 E-2 
N-1 N-4 E-2 G-2 G-2 

















The Tests —The five one-page tests that comprised the measuring 
instruments of the experiment were of the controlled completion 
type. Fifty statements were devised from the factual content of 
the stories, and made into completions, each completion coming at 
the end of the statement, and being a word or short phrase. These 
were grouped in ten units of five elements each, and two units were 
taken to form a test. The five correct completion responses were 
placed, with four incorrect ones, below each group of statements, 
and the pupil responded by inserting the number of the correct 
response. A sample unit of the test is given: 
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Put in the parentheses the number of the word that best completes the sentence. 


1. After the Industrial Revolution, many products were made in....... ( ) 
2. One of the chief causes of the Industrial Revolution was the invention 
ORE aa hata ie kt ES, ie in git Mee ge aa rd ( ) 
3. The Industrial Revolution was greatly aided by a plentiful supply of ( ) 
4. Before the Industrial Revolution most of the manufacturing was 
Ga obs hance e Mkres Cea occa ioe cv occutale vl Teme eLaeeke ile. a ( ) 


5. After the Industrial Revolution most of the weavers became........ ( ) 


1. farmers; 2. the steam engine; 3. factories; 4. Russia; 5. coal; 6. wage-earners; 
7. cloth; 8. homes; 9. big guns. 


The five one-page forms were designated as Tests a, b, c, d and e. 
A specimen test on the cover of the experimental booklet was used to 
instruct the pupils in its use. Very few failed to employ the correct 
method of answering. 

A check was made to be sure that no test element was answered in 
one sequence and not in another. Differences in results are essen- 
tially due to the form of the reading material, not to content. 

In order to determine the reliability of the tests they were admin- 
istered to 100 children outside of the main experimental groups, with- 
out the alternation of the reading material. The pupils of this group 
had some knowledge of the Industrial Revolution from having acted 
as subjects in a preliminary experiment. Each pupil took Tests a 
and b twice and c, d ande once. The order of the tests was rotated 
to compensate for practice effect, each test occurring in each position. 
From these data the self-correlations of tests a and b were determined 
as .61 + 0.4 and .71 + .03. The correlations of the five tests with 
the composite of the other four were .65, .61, .55, .67 and .68, all 
+.04. These reliabilities may be considered satisfactory for ten- 
element, three-minute tests. 

Administration.—The experimental material placed in the pupils’ 
hands consisted of ten pages of mimeographed material as follows: 


Name and Directions 


Test 
Reading unit No. 1 
Test 
Reading unit No. 2 
Test 
Reading unit No. 3 
Test 
Reading unit No. 4 
Test 
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The four reading units were different for each experimental group, 
following the sequences previously described. 

The same test was not used each time in the same position. Since 
the tests could not be assumed to be of the same degree of difficulty, 
it was considered desirable that each test should occur in each position 
within each experimental group. For this purpose the principal 
groups were divided into sub-groups, each having the same story 
sequence, but a different order of tests. The test sequences were. 


Sub-group a, Tests a, b, c, d, e 
Sub-group b, Tests b, c, d, e, a 
Sub-group c, Tests c, d, e, a, b 
Sub-group d, Tests d, e, a, b, c 
Sub-group e, Tests e, a, b, c, d 


The time allowed for reading each story-unit was two and one-half 
minutes, and for each test three minutes. The reading time was found 
to be sufficient for all but the most laggardly. The method of the 
experiment made it necessary that the timing be kept with fair pre- 
cision and that the subjects should not refer to the stories or work on 
the tests before or after the allotted period. To insure this the 
material was stapled at two diagonally opposite corners, each sheet 
being torn loose as required. This prevented ‘‘working ahead.” As 
each sheet was quitted the pupil was instructed to place it on the 
floor beside his desk. This very effectively prevented such errors as 
would have been caused by pupils taking the reading time to work on 
the tests. As an additional caution the order of sheets was indicated 
by large letters, from A to I, as designated above, stamped in the 
upper right-hand corners. This made it possible for the examiner to 
see at a glance that every pupil was working on the correct page. 

In administering the experiment full directions were given for the 
controlled completion test by means of the sample on the cover sheet. 
The pupils then tore loose this. sheet, placed it on the floor, and worked 
on the test in the A position for three minutes. The test sheet was 
then torn off, and the reading sheet in the B position was exposed. At 
the end of the time for reading this sheet was torn off. This process 
was continued for the remaining tests and stories. 

The Experimental Groups.—Approximately 1000 children partici- 
pated as members of the main experimental groups. These were 
pupils of the high sixth, seventh, and low eighth grades of five public 
schools in and near New York City. Socially they represented an 
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average of the city. The largest number were from two quite typical 
schools; one smaller group was from school largely attended by negro 
children, and another was from a superior school in a suburb. 

The members of the five experimental groups were made up by 
random selection in the classrooms in which the experiment was 
conducted. All of the material was identical in appearance and 
administration. The sets were so arranged that as they were dis- 
tributed the pupils in successive seats received the material for groups 
I, II, III, ete. The sub-groups, varying as to test arrangement were 
similarly selected. The distribution was arranged so that no adjacent 
pupils received the same test sequence. The pupils were informed of 
this fact, which practically eliminated the possibility of “ copying.”’ 


.The total number of pupils in each of the five experimental groups 


was about 200. 


Preliminary investigation showed the necessity of making a 
selection from this number. The following types of pupils were 
eliminated: 

1. Pupils whose tests were unscorable, due to illegibility or failure 
to follow directions. 

2. Pupils who scored zero on any test except the first. Most of 
these cases were due to extreme lack of effort, or to failure to under- 
stand directions. Those eliminated for this reason were, in general, 
the dullest children. 

3. Pupils who scored ten (perfect) on any test except the last. 
This elimination was necessary, since, in ‘‘ breaking the test’? these 
pupils could not demonstrate improvement after further reading. 

Very few pupils were eliminated because of the first three con- 
ditions. 

4. A fourth condition was made that the pupils should not know 
too.much about “‘The Industrial Revolution” in advance of the 
experiment. On this principle all those making a raw score of more 
than four on the first test were excluded. Although this caused the 
loss of a large number of cases, it was deemed desirable to have all the 
subjects start from a low and fairly homogeneous degree of knowledge 
of the concept. 

In order to determine the comparative intelligence test scores of 
the groups, the “ Multi-mental scale” was administered to about 500 
pupils in the school from which the largest number wastaken. Another 
group of about 200 had just been tested with the same test. In the 
remaining schools use was made of the data available, mostly from 
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group intelligence tests already administered. A very few cases were 
included for which no mental test data were available. 

The equating of groups by the method of random selection seems 
to have been quite successful. Table II gives the number of cases 
remaining in each group after the eliminations, and the mean Mental 
Age of these subjects. 


Taste II.—Equatina or ExpERIMENTAL GROUPS 





Group I | Group II | Group III} Group IV | Group V 





pe 137 130 122 134 126 
Mean MA (years)....| 11.99 11.93 12.14 12.00 11.95 

















Scaling the Tests—In order to eliminate the unevenness in the 
difficulty of the tests, the raw scores were transformed into scaled 
scores on the basis of the performance of 500 pupils in the main experi- 
mental groups. Twenty cases were taken at random from each of the 
25 sub-groups, and a distribution was made of the scores in each test, 
regardless of its position or sequence. This gave a distribution of 
scores which were made under varying conditions for each test, but 
under like conditions for the several tests. Thus, Test a was taken by 
20 pupils after reading N-1, by 20 after reading N-1 and N-2; and so 
on through the 25 sub-groups. Test b was taken similarly by 20 
pupils under each condition. These composite distributions were, of 
course, somewhat flatter than the normal curve. The moments of 
one of them were computed, and it was found to correspond quite 
closely to Pearson’s Type II frequency curve. 

The means and standard deviations of each of the composite dis- 
tributions were computed, and the deviation of each unit of raw 
score from the mean. These deviations were divided by the standard 
deviation of the distribution concerned, giving the scale value in terms 
of multiples of the standard deviation. In order to avoid negative 
quantities an arbitrary scale was fitted to the standard deviations 
scale by taking 10 as the mean and five units equal to one SD. The 
result of this scaling was to make the Means and SD’s in the com- 
posite distribution of each test equal. 

Results—Table III gives the mean scaled score for each test 
position of each of the five groups. The Standard Errors of these 
means differed so little among themselves that they are adequately 
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represented by an average SE for each test position. The average 
standard errors of differences were computed from the average SE’s 
of the means. 


Taste III].—Resvutts: MEAN SCALED Scores oF VARIOUS GROUPS AND POSITIONS 

















Test positions 
1 2 3 | 4 | 5 
Yo xdin n.d 09.0.0. 44.40 BERR Ren 4.91 | 9.66 | 10.95 | 11.74 | 12.40 
Es eke ids ce-ch bas odding See N-1 v-1 N-1l N-1l 
Er ee Ee pa 4.28 | 9.58 | 11.39 | 11.41 | 12.39 
es cad du ak Seabee tacbion aati N-1 N-2 N-3 N-4 
NS eco sa kak dekecdaiwedies 4.90 | 10.18 | 11.67 | 12.21 | 12.78 
Ee en Pera ere eae G-1 G-2 E-1 E-2 
nnd na 0i6.6d.04ebean 4400060 5.08 | 9.04 | 11.12 | 12.25 | 11.91 
le ac 9 oe Sesre4ae oaks 40's ey E-1 E-2 G-1 G-2 
I ide a-d esos Mo kee oe SUE wean 5.04 | 9.30 | 11.76 | 12.20 | 13.28 
ES Te See eee “aid E-1 G-1 E-2 G-2 
Average SE of means.................) .25 .38 .39 .37 .38 
Average SE of differences............. | .35 .54 .55 .53 .54 
Three SE differences.................. 1.04; 1.63 | 1.64; 1.58 1.61 




















These data are also presented by a graph, Fig. 1. The five scores 
of each group have been plotted as a learning curve.! 

Discussion.—Both the table and the graph show the small size 
of the differences discovered. The gains made from the first test to 
the second show the differences between the single types of ‘“‘story,” 
N-1, E-1 and G-1. The gain made by those who read G-1 is .53 
points greater than the gain made by those who read N-l. The 
gain from reading N-1 is .73 points greater than the gain from read- 
ing E-1. The comparison of these differences with their standard 





1In order to compare the gains due to learning with those due to practice 
effect on the tests the means of five successive tests taken by the outside group 
of 100 pupils were determined. The gains of the ‘successive tests over the first 
were as follows: Second 2.48; third 1.52; fourth 2.35; fifth 2.13. Making allow- 
ance for chance fluctuations, it seems that the maximum gain due to familiarity 
with the test is less than 2.5 points, and is distributed in decreasing amounts 
over the first three or four tests. Probably about half of the gain of the main 
groups from the first test to the second must be ascribed to practice effect. The 
point is not important, as no conclusions are drawn from the absolute values of 
the scores. 
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errors show that the probability that G-1 is ‘better’ than E-1 is 
.96 or 25 to 1. The probability that G-1 is superior to N-1 is .75 
or 4 to 1; and the probability that N-1 is superior to E-1 is .88 or 
8 to 1. These probabilities are very much lower than is demanded 
by sound statistical practice. At most we can only suspect that 
there is a real positive difference between G-1 and E-1; the other 
differences being inconclusive and very possibly due to chance. 

The gains from the first test to the last, comparing the merits of 
the five sequences, are equally inconclusive. The gain due to the 
best combination of the episodic narrative and generalization is only 
.75 points greater than the gain due to reading the same textbook 
narrative story four times. This difference is 1.39 times its standard 
error, making the probability that it is real and positive only about 
.92 or 12 to 1. 

A further analysis of the results, comparing the gains due to each 
story with the others of the same position, gives the same finding, 
that the differences are small compared to their standard errors. 

Conclusions.—The very evident. conclusion seems to be that, 
under the conditions of the experiment, it makes little difference 
which type of story or which sequence, is read. This smallness of 
differences gives cause for suspicion that some very important factors 
in learning were constant for the various groups, and did not depend 
on the reading material. Two such factors may be found in the 
problem situation created by the experiment, and in the amount of 
practice. 

The Problem Situation—One factor in learning in the social 
studies is the urge or motivation behind the efforts of the pupil. 
In the experiment this motive appeared to be the same for all the 
groups, regardless of the stories that they read. It was directed to 
the tests rather than to the reading material itself. The children 
seemed to consider the tests of prime importance and the stories 
rather secondary. The problem situation of “How can I get a high 
score on the test?’ entirely outweighed the problem of the Industrial 
Revolution. The fact of the almost equal achievements of the 
experimental groups must be considered in the light of this sameness 
of motivation. In school situations, however, the urge provided 
by the continued testing would be absent. The interest of the pupil 
and the problem attitude must be awakened by other means, by 


the skill of the teacher or by the use of interest-provoking reading 
material. 
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Practice—The experiment shows very clearly that, at least in 
regard to the limited function tested, repetition, practice, is a most 
important element in learning a social science concept. The gains 
are proportional more to the number of times that the facts are read 
than to any other factors. It is necessary then, in arranging read- 
ing materials in the social studies to provide for sufficient practice. 
The facts must be met not once, but again and again to secure ade- 
quate understanding. The shape of the learning curves implies 
that even four repetitions have not exhausted the possibilities for 
gain, as they show no sign of having reached the limit of improvement. 
In the experiment, practice was equally well secured by reading one 
narrative a number of times or by reading several varied selections. 
In the classroom, or under ordinary study conditions, however, the 
first procedure is undesirable. The pupils would not read the single 
narrative enough times to secure the necessary practice. They would 
rebel against such a dry, uninteresting procedure. The use of longer 
selections, which in themselves would give the necessary repetition 
of the concepts, but which the pupil read only once, is necessary to 
avoid this evil. 

So much for the evident implications of the experimental evi- 
dence. There seems to be another important factor, however, which, 
although it is more speculative, is very pertinent to this discussion. 

Relation of Reading Material to Objectives—Achievement must 
always be considered in terms of the end or aim of instruction. In 
an experiment such aim is implied by the tests which measure the 
outcome. The tests here used measured a part, but only a small 
part, of the object of instruction in the social studies, namely the 
ability to complete general statements concerning the concept being 
taught. It is interesting to note that such small differences as do 
exist place the individual stories in an order of merit which is also 
the order of their resemblance to the tests: Generalization, first; 
Narrative, second; and Episode, third. This is in accord with the 
psychological principle that the function should be practiced in 
the form in which it is to be used. To fulfil the other objectives of the 
social studies, to give pupils an understanding of social concepts and 
problems that will function in life situations rather than in paper 
situations, quite different reading materials might well be superior. 
For this aim the dramatic living treatment of social themes seems 
to hold the most elements in common with the dynamic outcomes 
desired. 
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A modification of the concept of practice is also necessary when 
consideration is given to the more general objectives. When the 
aim desired is the understanding of broader trends and generaliza- 
tions, rather than details, these wider themes should be the subject 
of the repetition. The minutie should vary and be forgotten; the 
broad concepts presented again and again in varied aspects will be 
remembered. 

These three positive implications concerning the psychology 
of learning in the social studies are not unlike the principles under- 
lying learning in other fields that have been investigated. 

1. Learning will be most economical when in response to a problem 
situation felt by the pupil. 

2. The amount learned is, other things being equal, proportional 
to the number of repetitions, which for the achievement of the broader 
aims, should be in varied settings. 

3. The materials of instruction should be similar in form and 
content to the actual objectives desired. 

But does this mean that teaching in the social studies should 
be reduced to a series of practice exercises, giving repetition in “‘ prob- 
lem” situations? It does not. The objectives in the social studies 
are in life, not in the ability to reproduce, complete or select on a test 
blank. The problems are intrinsic in the social situations, and not 
to be defined in terms of test scores. The practice is necessary, but 
should be so arranged that the general understandings and atti- 
tudes desired are practiced, rather than the details. 
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OBSERVATIONS ON THE VALIDATION OF THE 
GROUP WILL-TEMPERAMENT TEST 


JUNE E. DOWNEY 
University of Wyoming 


In a recent article in this Journal, the reliability of the will-tempera- 
ment tests was discussed at some length.! In the present paper I wish 
to discuss certain phases concerning the validation of the tests.2 This 
problem is a most difficult one to handle partly because of uncertainty 
as to the criteria that should be used in checking the separate tests and 
the test as a whole. 

The method used by most investigators has been the calculation of 
coefficients of correlation between the scores received in the different 
tests and the judgments passed on the subjects of the test by one or 
more relatives, teachers or acquaintances. In passing judgments on 
individuals in this manner it is necessary to proceed on the basis of 
one’s understanding of what the terms used in the will-temperament 
test mean. There are several sources of error in the procedure as a 
whole: (1) The tests themselves may be incorrectly labeled as tests of 
certain traits; (2) the trait names used may be understood in a different 
way by the author and by other investigators; (3) a certain amount of 
inaccuracy in estimating traits even of intimate acquaintances is 
unavoidable. It is doubtful if this method can be relied upon to give 
conclusive results, at least in the present stage of investigation. It is, 
however, the natural way in which to attempt to check the validity of 
the will-temperament test. 

It seems to the author that the best method to use in determination 
of the value of the will-temperament tests is the discovery, if possible, 
of certain situations in which one or more of the tests have differential 
value. Some investigators have approached the problem from this 
angle, notably Miss Bryant, who by careful comparison of delinquent 
and non-delinquent boys was able to suggest that certain of the 7ndivid- 
ual will-temperament tests show diagnostic significance so far as the 
two groups were concerned.’ 





1 Downey, J. E., and Uhrbrock, R. S.: Reliability of the Group Will-Tem- 
perament Tests. Journal of Educational Psychology, Vol. XVIII, No. 1, 1927, 
pp. 26-39. 

2 This report, just as the one on Reliability, was made possible by a grant 
from the National Research Council by its Committee on Migration Research. 

* Bryant, E. K.: Delinquents and Non-delinquents on the Will-temperament 
Tests. Journal of Delinquency,Vol. VIII, 1923, pp. 46-63. 
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The present report relates to the group tests only. The correlations 
reported were made with raw scores rather than with the published 
norms. Moreover, various new possibilities in method of scoring were 
tried out, some 60 forms being utilized. For a list of these forms and 
further details the paper on reliability of the tests should be consulted. ? 

In the investigation both of the reliability and of the validity of 
the tests, one factor has not been adequately reckoned with and, in 
fact, remains an outstanding problem. I refer to the effect of age and 
of maturity upon scores made in the tests. 

The first question raised in the present investigation was the effect 
of intelligence on will-temperament scores. Coefficients of correlation 
between intelligence tests and raw scores for the various will-tempera- 
ment tests were calculated for three different groups: 42 high school 
boys, 37 high school girls, and 149 normal-college women. The 
1Q’s of the high school boys and girls were calculated from the National 
Intelligence Test, FormI. Thorndike scores were used for the normal- 
college women. 

Table I gives the data for those correlations of the more than 60 
obtained which were +.20 or above. As a matter of interest the 
negative correlation of the same range were also included. In general, 
the correlations are too low to be significant, a statement which holds 
without exception for the older group of normal-college women. Possi- 
bly for the high school groups the positive correlation with intelligence 
of the following tests is worthy of consideration, namely, speed of move- 
ment and speeded movement, motor impulsion and non-compliance. 

Correlations of intelligence tests with school grades range in general 
between +.40 and +.60. It is urged that certain personality traits 
must influence school grades and the desire to find personality tests 
that can be combined with tests of intelligence in such a way as to 
raise the correlation with grades accounts for much of the present day 
preoccupation with personality tests. Many investigators have 
assumed that school success may be used as the criterion for checking 
the validity of supposed tests for energy, drive, and persistence. 

As a matter of fact we do not know how personality traits other 
than intelligence may affect school success. We can not even say that 
given personality traits affect school grades in the same way with all 
teachers and under all systems. The interplay of personality factors 
so far as teacher and pupil is concerned is extraordinarily subtle. 
Extreme suggestibility and great speed of reaction may influence some 





1 Loc. cit., 30f. 
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TABLE I.—CoRRELATIONS BASED ON First GIviInGc oF DownEy Group 
TEMPERAMENT TEST WITH INTELLIGENCE SCORE 


(All Correlations Positive unless Otherwise Indicated) 





WILL- 





Test name, number and method of scoring 





Writing name, II, 2 and III: Ratio, name rapid 
ar ree 
Writing name, II, 2 and III: Ratio, name rapid 
to slow; total mm. measure............... 
Coordination of impulses, Test V: Writing 
United States of America rapidly on short 
I os ath a ak ow wmiee ) ae 
Speed of movement, II, 1: Name, usual; letter 
RE See ee Nee Pe Ae eee 
Speed of movement, II, 1: Name, usual; total 
ES RES ER RR hap TS, PARR CE 
Speed of movement, II, 2: Name, rapid; letter 
Ne ie ibade ork oc Ward Fone 64-4 ae eee E se 
Speed of movement, II, 2: Name, rapid; total 
GARETT Se eG ey RS: 
Speed of movement, VI, 1: United States of 
America, usual; letter count............... 
Speed of movement, VI, 1: United States of 
America, usual; total mm. measure......... 
Speed of movement, VI, 1: United States of 
America, usual; average mm. measure...... 
Speed of movement, VI, 2: United States of 
America, rapid; letter count............... 
Speed of movement, VI, 2: United States of 
America, rapid; total mm. measure......... 
Speed of movement, VI, 2: United States of 
America, rapid; average mm. measure...... 
Motor inhibition, Test III: Letter count...... 
Volitional perseveration, VIII, 2: United States! 
of America; time in seconds............... 
Motor impulsion, Test X, 1: Name, eyes open, 
usual style and speed; total mm. measure... 
Motor impulsion, Test X, 1: Name, eyes open, 
usual style and speed; average mm. measure. 
Motor impulsion, Test X, 2: Name, eyes closed, 
usual style and speed; total mm. measure... 
Motor impulsion, Test X, 2: Name, eyes closed, 
usual style and s eed; average mm. measure. 
Motor impulsion, Test X, 3: Name, eyes on 
examiner’s pencil while counting taps; total | 
es SS Es oy ddud ob 4s a eee 0 Oo ee Re | 
Motor impulsion, Test X, 3: Name, eyes on | 
examiner’s pencil while counting taps; aver- 
iit Re i ia PR ag 
Motor impulsion, Test X, 4: Name, eyes open 
while counting number of times examiner 
repeats word “fly;’’ average mm. measure. ‘| 
Self-confidence, Test XI: Number doubly | 
NS EOE PO ee eee 
Non-compliance, Test XII: old method....... 
Finality of judgment, Test XIII: number of 
DER Ch asec ccdceneeerabeccents annus ounces 
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teachers favorably, others unfavorably. 


Inertia is interpreted by some 
teachers as laziness, by others as a sign of thoroughness. 


Even very 


superior intelligence may influence school success variously, depending 
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upon the level of intelligence of the teacher. If an original or pro- 
found student’s reaction cannot be understood by a teacher because 
of the teacher’s own limitations, the grades of the student will no 
doubt suffer. | 

I have attempted to determine the extent to which certain tests of 
the will-temperament series influence grades by use of partial and 
multiple correlations. Unfortunately my program could not be carried 
through completely because of the fact that the grades received by the 
groups investigated did not always fall into a normal distribution 
curve. For example, the grade-curve of the 149 normal-college 
women was so definitely skewed that this material could not be used. 
A low mark was the exception. This circumstance eliminated my 
largest group and the one which on other accounts had been most 
promising. For this group, grades and Thorndike scores correlated 
only +.345 + .05 

The complete calculation was finally carried through for one group 
only, namely the 42 high school boys. In this group the grading 
conformed fairly well to a normal distribution. I had calculated not 
only the correlation for raw scores on each separate test with grades 
and with intelligence but also had worked out the correlation with 
grades when intelligence was kept constant in the case of those tests 
that proved fairly reliable as shown by the self-correlations and 
the probable error. Table II gives the highest correlations with 
grades when intelligence is kept constant. It will be noted from Table 
II that Test VI-2 (writing the phrase ‘ United States of America”’ as 
rapidly as possible), correlated +.46 with school grades when intelli- 
gence was kept constant. This test is especially interesting since 
another investigator? has reported that it is the best of the will-tem- 
perament speed tests from the standpoint of the correlation with a 
Composite Speed Criterion. 

Next, a multiple correlation was computed, using the three vari- 
ables represented by (a) Grades, (b) Intelligence Quotient, and (c) 
Test VI-2 from the Downey Group Will-Temperament Test. The 
method of scoring this test is particularly important, z.e., on the basis 
of total millimeter measure. Table III shows the means, standard 
deviations, and the inter-correlations for the three variables. 





1 The actual calculations were made by Professor O. H. Rechard of the Depart- 
ment of Mathematics, University of Wyoming. 

2 Uhrbrock, Richard S.: “An Analysis of the Downey Will-Temperament Tests.”’ 
Teachers College, Columbia University Contributions to Education, No. 296. 














596 The Journal of Educational Psychology 


TasieE II.—PartTiaL CoRRELATION OF WILL-TEMPERAMENT TESTS WITH GRADES 
KEEPING INTELLIGENCE CONSTANT 











Forty-two high school boys . 
Test.name, number and method of scoring r 

Writing name, II, 2 and III; Ratio, name rapid to slow; letter count..| + .31 
Speed of movement, II, 2: Name, rapid; total mm. measure.......... + .26 
Speed of movement, VI, 1: United States of America, usual; letter 

Tt ittaness cibhts nadqadhgn snsdatehes>¢chihteinausdhs awe + .25 
Speed of movement, VI, 1: United States of America, usual; total mm. 

PD. 05 dae gad dat 64.560) Saha n eed 04) ose ude a eaan eee ket ane + .35 
Speed of movement, VI, 2: United States of America, rapid; average 

A NE hw nk bk Aidieibindtine Liveha hn Silern bis bau ocd Wa bind abba euh + .23 
Speed of movement, VI, 2: United States of America, rapid; total mm. 

itd 6:2 ok bd dees one he OOeb ose kb sds eeeennsenbeenmn tune ee + .46 
Motor impulsion, Text X, 1: Name, eyes open, usual style and speed; 

i, 9 << ..0 dee rah dnb haben ss 640 x0 bake oe ool creas a + .34 
Motor impulsion, X, 2: Name, eyes closed, usual style and speed; total 

LY is ewbde doc ad wh ullds abbiabes ooh's da thiemaeay Kathie sd + .33 
Motor impulsion, X, 3: Name, eyes on examiner’s pencil while counting 

EE EE OD ee Te ee + .29 








TasLte JII.—Ssowina Means, SD’s anp INTER-CORRELATIONS FOR THREE 











VARIABLES 
(I) Grades (2) Intelligence | (3) WT Test VI-2 
Sees 76.71 | Mean.......... 102.61 | Mean....... 294.2 mm 
BE os Jus owe me Pi eck oe cede cbs 8 : Pree 69.4 mm 
PR aes Sd we > Oe ad 56 Fas sido tdee > «bed _ 38 Ties tcoenesece 4 .03 








The multiple correlation between (a) Grades on the one hand, and (b) 
Intelligence plus the score in Test VI-2 on the other, is +.69. This 
is by far the most significant relationship that has been reported where 
intelligence test scores have been supplemented by will-temperament 
test scores. 

Since the tests give a low correlation with intelligence and many of 
them are fairly reliable, it would seem that situations should be 
discovered in which they could be used in a differential way. Possibly 
such an investigation should at first concern itself with a comparative 
study of groups that are very unlike one another in behavior. Delin- 
quent and non-delinquent groups suggest themselves as excellent 
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material and also normal and psychopathic groups. Vocational groups 
should also be utilized. 

Since the published tests are not well adapted to testing certain 
of these groups, a non-verbal will-temperament test was devised which 
reduces to a minimum the educational factor and relies upon a simpler 
movement than handwriting.' 

The tentative form of this non-verbal test is not satisfactory in so 
far as some of the sub-tests do not show a sufficiently high reliability 
as tested by self-correlations and many do not correlate highly with 
corresponding tests of the verbal group will-temperament series. 
Frequently, however, there is an obvious reason for this latter result 
because of differences between the two tests in the way of content 
or technique. In spite of inadequacies of the non-verbal form, results 
obtained from its use on two very unlike groups may be cited to 
illustrate their difference in reaction to it. Again it should be stated 
that the effect of age upon scores has not been satisfactorily canvassed. 
The tests gave low correlations with intelligence. 

Table IV summarizes the comparative results for a group of 
delinquent (reformatory) and non-delinquent (public school) girls in 
terms of the percentage of the latter whose median score equals or 
exceeds the median of the delinquent in the case of the most reliable 
tests of the non-verbal form. A more satisfactory comparison could 
be instituted between two groups the members of which had been 
paired for both chronological and mental ages and possibly also with 
respect to social status but such data are not at present available. 

Tentatively we may conclude that the following tests of the non- 
verbal will-temperament series have some differential value so far as 
delinquent and non-delinquent girls of the specified age are concerned. 
Speed of movement, ability to speed, expansiveness, drive, self-con- 
fidence, reaction to suggestion, possibly sticking to decisions. The 
non-delinquent girls were speedier, more able to speed under pressure, 
and showed more expansiveness, drive and self-confidence. The 
delinquent girls were possibly more suggestible and did not stick to their 
decisions as well, that is, they spent more time in reconsidering their 
decisions. 

A comparison was also instituted between two older groups of 
delinquent and non-delinquent girls, 17 and 18 years of age. There 





1For a preliminary report on the Non-verbal Will-temperament Test, see 
article by R. 8S. Uhrbrock and J. E. Downey in the Journal of Applied Psychology, 
Vol. XI, 1927, pp. 95-105. 
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TaBLE IV.—NON-VERBAL WILL-TEMPERAMENT TEST 


Comparison of Medians, 99 Public School Girls and 74 Delinquent Girls, 13, 14, 
15, 16 Years in Age 








Per cent of non-delinquent 

Name of test equalling or exceeding 

median of delinquent 
CCCP EE Pee Tee 59.39 
IL, in sine ak pbd oe ER habs ocmale 73.73 
ad 5 oi din we wh 0 a pee 4% tee 79.79 
I cc ua cut eview eee bap wed tee p wows 59.28 
cathe i vewk ve adede ey Eee hens 6 cee 71.30 
SL A ht ok badd a wewie Vets Swale Sw be DRE as Nee ee 67 .06 
Ns en ockus o's tales Sinrcaninbameineyenae 26 71.70 
Reaction to suggestion®............cccccccccccsces 44.10 

I oo x: nave 6460, nae one wee 20 .20—reliability low 








1 The higher the raw score the less the power of inhibition. 
2 Number of suggestions taken. 
3’ Time spent on reconsideration, the more time the higher the raw score. 


were 24 in the former group and 19 in the latter. Since the groups 
were so small the mean scores were calculated. The differential tests 
this time appeared to be ability to speed, ability to hold back, expan- 


TaBLE V.—NON-VERBAL WILL-TEMPERAMENT TEST 


Comparison of Medians, 84 Public School Boys and 72 Delinquent Boys, 14, 15, 
16 Years in Age 








Per cent of non-delinquent 

Name of test — equalling or exceeding 

median of delinquent 
III, hb y vv os 8bd vc ade Gas eenisesece sn 23.33 
IIS.» 6 5s isin an vieeveceweeie dé ax 38.09 
in nacens si84 0 vepads + Geese evens ans 45.24 
I iiss noi Kaween ep pees ese tains 48.09 
EE nb pi abng es bass 04a 0s cease 6 64.0.9 52.71 
ie sana nel awh én. e ne ake aan a ke.be ch 65.11 
a aa ng re Se an ew iia wd 40.47 
Reaction to suggestion®............ccccceecscceees 57.02 
II, og. ops Un n 6400 0b phew Geen een 60.71 








1 The higher the raw score the less the power of inhibition. 
2 Number of suggestions taken. 
3 Time spent on reconsideration, the more time the higher the raw score. 
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siveness and self-confidence in favor of the non-delinquent girls; while 


as before the delinquent girls were slightly more suggestible. 

A comparison of delinquent (reformatory) boys with non-delinquent 
public school boys gave in some respects a reversal of the result with 
the girls. See Table V. 

The delinquent boys appear to be more hyperkinetic than the 
delinquent girls. They are actually higher in intelligence and may 
represent quite different forms of delinquency. Obviously the results 
have value only because illustrative of a possible method of attempting 
to get at the significance of the tests. 

I have at hand also records on the non-verbal will-temperament 
test from 27 feeble-minded boys and 22 feeble-minded girls for compari- 
son with the median score of boys and girls of 13, 14, 15 and 16 years of 
age. The chronological ages of the boys ranged from four under 12 
years to six over 20 years, the largest number being centered around 
the ages 14 and 15. The chronological ages for the girls ranged from 
14 years to six over 20 years; the majority of the cases were at the ages 
14, 15, 16 years. In the feeble-minded group the mental ages for the 
boys ranged between 4 and 11-3. For the girls from 6 to 12. Seven- 
teen of the girls gave mental ages between 8 and 12. The results are 
summarized in Table VI. 


TaBLeE VI.—NON-VERBAL Test, FEEBLE-MINDED 
Per Cent Equalling or Exceeding Median of Normal 














eee al enh <a Twenty-two 

oys girls 
I oie ste ccesdnceposcscevceawers 18.51 68.18 
INI, sok ndc sede vescadsevsdeeeen's 13.04 45.45 
Sie xs vin ccuseRetebvesecevesens 0.00 9.09 
Th coninded dee dnwasée atenenan 100.00 95.45 
i cocckeeh tes ve dcaNensswosesunaing 25.00 13.63 
Se ha alata lL a 8 atl 25.00 9.09 
SE Or rp ae ee mere | 11.53 45.45 
Reaction to suggestion*............-..-....++ees! 61.53 76.19 
I I sons cincs dwdccsedzeconesons | 92.59 95.45 





1 The higher the raw score the less the power of inhibition. 
2 Number of suggestions taken. 
’ Time spent on reconsideration, the more time the higher the raw score. 


There are a number of outstanding differences between the feeble- 
minded and normal groups. For example, not one feeble-minded boy 
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equals the median score of the normal group of boys in ability to speed 
and only 9 per cent of the girls equals or exceeds the median of normal 
girls. Conversely 100 per cent of the feeble-minded boys and 95 per 
cent of the feeble-minded girls equals or exceeds the normal boys or 
girls in their raw score on ability to hold back. A low score on this 
test means, it will be recalled, power to inhibit movement; a high score, 
the reverse. The feeble-minded also run low on expansiveness and 
drive, but high on reaction to suggestion and sticking to decisions. In 
this latter case the higher the raw score the longer the time spent in 
reconsidering decisions. 

The feeble-minded individuals obtain excessively high raw scores 
on the test entitled persistence, a test which has not been included in 
the tables because of its low reliability. This test measures a tendency 
just to keep ‘‘ plugging away,” without any necessary reference to a 
successful activity. Such perserverative activity, frequently in a 
stereotyped form, is characteristic of subnormal individuals. My 
reason for including such a trait among those designed to test the 
dynamic pattern is that conjoined with a fertile intellect it might be a 
great asset. In general, in school work it is probably a liability. Itis 
not synonymous with industry. | 

The attempt at validating the will-temperament tests as outlined 
in the above paper is obviously sufficiently positive to encourage 
further experimental work. We believe that certain significant 
indications point to results in the future. Certain speed tests, especi- 
ally those described as ability to speed, expansiveness and drive 
appear to have differential value. The great problem is to discover 
just those situations in which the tests have significance. 





INTELLIGENCE OF THE CONTINUATION SCHOOL 
PUPILS OF WISCONSIN 


JOSEPH SUDWEEKS 


Brigham Young University, Provo, Utah 
INTRODUCTION 


1. The Problem.—The problem with which this study has to deal 
may be stated in the two following questions: 


How does the intellectual capacity of continuation school pupils compare with 
the intellectual capacity of pupils of the same ages in high schools? 


Does this new type of institution, the continuation school, have a distinct type 
of material with which to work? 


2. Method of Study.—During the school year 1923-1924, there 
were enrolled in the day sessions of the 44 continuation schools of 
Wisconsin 28,501 pupils. In the effort to get a representative group 
of this number for study, eight schools, situated in as many towns 
and cities of the state, were selected. The eight schools are well 
distributed over the state and represent a wide range in size of school. 
In these schools practically all the day students in attendance on the 
days that the tests were given were included in the study. The testing 
was all done between November, 1924, and March, 1925. 

The Terman Group Tests of Mental Ability, adapted especially 
to Grades VII to XII, were used to measure the intelligence or bright- 
ness of the population studied. Form A was used throughout. The 
Stanford Achievement Test, Advanced Examination, was also given 
to the pupils in one school. All of the testing was done under the 
direction of the directors of the schools, but the scoring was all done 
by the writer. 


THe ReEsULTS 


1. The Data.—The distribution by ages and sexes of all pupils 
tested is given in Table I. 

The rapid fall off in numbers at 18 years of age is accounted for by 
the fact that attendance is compulsory only to that age. A glance at 
this table shows that the continuation schools draw the older and more 


retarded portions of the population as compared with the high school. 
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TaBLE [ 
AGEs Bors GIRLs TorTaLs 
ll 1 1 
12 
13 3 3 6 
14 34 64 98 
15 131 159 290 
16 270 316 586 
17 354 376 730 
18 31 45 76 
19 3 9 12 
20 3 5 8 
21 1 2 3 
22 up 7 7 14 
bs ye beds ellen eeben 838 986 1824 
Median ages.......... 16-11 16-10 16-10.5 


Table II shows for each school tested, in terms of IQ, the lower 
quartile, median and upper quartile points, quartile deviation, and 
also the number tested. 











TaBie II 

’ Number 

School Ql Median Q3 Q onated 
D 81.7 88.2 95.3 7 344 
A 80.8 87.3 95.7 7.4 359 
C 82.5 87 90.6 4 246 
E 75 85.4 93.7 9.3 150 
H 80.6 85 89.4 4.4 14 
F 79 83.7 90.1 5.6 71 
B 76.5 82.9 90.2 6.9 448 
G 76.4 82.9 89.8 6.7 192 
Total 79.1 85.5 93 6.9 1824 




















The distribution by IQ of the whole group by chronological age is 


presented in Table III. 


The middle half of the total number is found 


betwee 79.1 and 93 IQ. Reference to this table shows the standing 
of the age groups to be as follows: 
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The first four groups are above the median and the last two are below. 
It is easy to account for the high comparative standing of the 18 and 
19 year old groups. They are beyond the compulsory age and hence 
are in school because they desire to be—probably because they realize 
the need for further training. The 15, 16, and 17 year groups are 
very largely those who have been eliminated from the full-time 
school—those upon whom the selective influence of the high school has 
operated; hence they stand lowest. Probably but few of the 98 
fourteen-year-olds have tried out the high school. Hence the selective 
influence has not operated there and the group ranks highest. 


TasBLe III.—Disrrmvtion or IQ spy CHRONOLOGICAL AGES 



































19 Totals, 
IQ 11/13; 14 15 16 17 18 Up Totals per cent 
120-124 re A a B Esc ccs chtvbocstentneeesen 1 .05 
115-119 Siule scutes Gleb éecaleceeae B Biv eet 1 2 .10 
110-114 a ae 3 4 9 1 1 19 1.10 
105-109 |.../...) 5 8 11 23 2 3 52 2.85 
100-104 jot Pee 11 30 43 7 2 107 5.85 
95-99 eS 29 48 73 9 2 177 9.70 
90-94 D hook Be 38 80 99 9 6 248 | 13.60 
85-89 oo Bae 50 102 137 20 10 342 18.75 
80-84 cool 21 we 62 134 143 1l 8 378 20.65 
75-79 icsleol. @ 49 79 91 8 1 237 13.00 
70-74 ee ae 28 60 74 3 2 168 9.20 
65-59 a ee ae 9 26 30 |. 4 1 70 3.85 
60-64 iteghéndibehds 2 12 Ait 23 1.25 
Totals 1; 6); 98 | 290 | 586 | 730 | 76 | 37 | 1824 | 99.95 
Median IQ ...|...| 89.8] 84.6) 84.2) 85.7) 87.5) 88.2 85.5 











In Table [V is presented the distribution by mental age and by sex. 

The median mental ages show a difference of 2.6 months in favor 
of the girls. Table I shows the median chronological ages of the boys 
to be 16 years, 11 months; and of the girls, 16 years, 10 months. Thus 
the IQ of the girls is slightly higher than that of the boys. A computa- 
tion of the probable error of the difference in mental ages shows this 
difference to be insignificant.' 

Using the Mental Age Standards for Grading given by Professor 
Terman,’ it is seen that these 1824 boys and girls between the ages 





1 The difference is only 2.7 times its PE. 
2 Terman, L. M.: “‘The Intelligence of School Children.”” (1919), p. 93. 
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13 to 19 cover in their mental development a range of from Grade IV 
to Grade XIII—a spread of 9 school grades. About 80 per cent of 
the group have attained the age of 16 and hence there will be but little 
increase in mental ages. A grouping of those below 16 years of age 
and those above 16 years shows a median IQ of the former to be 86.2 
and of the latter 85.3. 


TaBLeE IV.—DistrimvutTion or Boys AND GIRLS BY MENTAL AGES 















































Boys Girls | Total 

Mental age 7 — 

Number | Per cent | Number | Per cent | Number | Per cent 
18 to 6 up 2 2 1 a re 1 
18 to 18-5 1 1 2 2 S44 1 
17-6 to 17-11) 9 1.1 4 4 13 | 7 
17 to 17-5 15 1.8 16 1.6 31 1.7 
16-6 to 16-11 18 2.1 17 1.7 3 | 1.9 
16 to 16-5 33 3.9 38 3.9 71 3.9 
15-6 to 15-11 33 3.9 43 44 | % 4.1 
15 to 15-5 53 6.3 eo}: 7084 @ 6.6 
14-6 to 14-11 60 7.2 84 lee) ao 
14 to 14-5 89 10.6 117 11.9 | 206 | 11.3 
13-6 to 13-11 88 10.5 a3: | Wes 0 6 Cj} «C98. 
13 to 13-5 gs ° i @, 116 | 11.8 | 199 10.9 

12-6 to 12-11; 109 | 13.0 132 | 18.5 | 241 13.6 
12 to 12-5 72 | 8.6 78 | 8.0 | 150 8.2 
11-6 to 11-11 72 | 8.6 6s | 7.0 | 140 7.7 
lltoll-5 | 53 | 6.3 39 | 4.0 | 92 5.4 
10-6 to 10-11 o.-| -08 19 «(| 19 | 46 2.5 
10 to 10-5 20 | ‘8.4 “aia | 1.7 
9-6 to 9-11 | 1 | As 1 1 | 2 1 

| | 
Totals 838 | 99.6 986 99.7 | 1824 | 1005 
Median MA 13-4.7 13-7.3 | 13-6.5 








The Stanford Achievement Test was given to 230 of the 246 pupils 
of school C (see Table II) to whom the intelligence test was given. 
Table V shows the results of this test in median scores and age equiva- 
lents for each subject. The age equivalents represent the ages of 
normal pupils who should be able to attain the corresponding scores. 
They are also the subject ages for the various subjects. Age equiva- 
lents are taken from the Revised Age and Grade Norms for the Stan- 
ford Achievement Test (July, 1924). 
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Taste V.—Resvutts or STANFORD ACHIEVEMENT Test, ScHooL C 
Subject age 
. ° l . 
Subject Median equivalents Middle 
scores half 
Years | Months 
SEE IRE LR LS 185 13 11 | 160-210 
8a ad bitin 4 OS aan okies g 219 14 1 179-242 
Nature study, science................ 59 13 1 47-68 
History and literature................ 39 13 0 28-55 
SEPP Eee eee 34 14 4 26-40 
Spelling dictation................... 151 14 2 127-170 
Composite score and median: 
Educational age..................| 68 13 | 9 75-58 
Median chronological age..........| 17 | 1 | 
; , : 13-9 
Approximate educational quotient................... i717 80 
ES Pa pe ees = 87 
ID SI wi 6.050 0 ae: eaende bance abhulee 807 = 92 


The median ability in each school subject corresponds to the following grade 
placement: 


Reading—early VIII Nature and science—late VII 
Arithmetic—early VIII History, literature—early VII 
Language—middle VIII Spelling—early VIII 


Pupils of School C, as far as educational attainments are concerned, are about 
two years behind normal children. 


2. Comparisons.—The following intelligence classification of con- 
tinuation pupils is based on the standards given by Professor Terman. ' 


TABLE VI.—INTELLIGENCE CLASSIFICATION OF CONTINUATION PUPILS 




















| 
IQ Class Number | Percentage 
Above 140| Near genius, or genius................ None | 
po a rer 1 | .05 
a i a ge ee se wake Sa 21 1.2 
90-110 | Normal or average................... 584 32.0 
EE CRS a veteran ya.dnd as thos cea cas 720 39.4 
70-80 | Border line deficiency................. 45 | 22.2 
Below 70 | Definite feeblemindedness............. 93 5.1 
| 





It is seen that as many as 498, or 27.3 per cent, are as low as border 
line deficiency; and 93, or 5.1 per cent, are definitely feebleminded. 





1 Terman, L. M.: ‘The Measurement of Intelligence.’’ (1916), p. 78. 
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There are 23 pupils, constituting 1.25 per cent of the whole, who fell 
between 60 and 65. It seems probable that but few of the 498 who are 
at border line or lower are really profitting by much of the instruction 
that it is possible to give them in the Vocational Schools. 

Only one pupil falls in the very superior class; and 21, or 1.2 per 
cent are superior. In spite of the wide limits of the normal or average 
class (90 to 110) more are found within the dull class than within the 
normal or any other class. Almost as many classify as feebleminded 
and border line as normal. 

A comparison between the standing of the continuation group and 
1,700,000 U. S. army recruits is presented in Table VII. 


TaBLeE VII.—ComparIson OF CONTINUATION Pupits AND U. 8S. Army REcRUITS 




















——— Continuation} Recruits, 
| Mental age Description percentage | percentage 
18 19-6 Very superior 3 4.1 
16-6 17-11 Superior 4.4 8 
15 16-5 High average 14.7 15.2 
13 14-11 Average 42.2 25 
11 12-11 Low average 35 23.8 
9-6 10-11 Inferior 4.3 17 
Up to 9-5 Very inferior ane 7.1 
..., wood ennanebebwaneee en 13.6 13.2 





The continuation group is seen to be somewhat superior in median; 
they also have fewer in the lower classes. Among the recruits, there 
are more in the three upper classes. 


REsuuts OF SIMILAR INVESTIGATIONS 


The results of seven other studies of continuation or part-time 
school pupils are presented for comparison with the findings of this 
study. 

A very close agreement is shown between the results of the present 
investigation and the Massachusetts and Denver studies. Mr. 
Plenzke’s Wisconsin results and the Pennsylvania results are not far 
different. It appears that the median for all ages and both sexes 
(which is not given in the report read) for Honolulu would be almost as 
near. In fact the only case of serious disagreement is in the New 
York investigation. A sufficient explanation for this is found in the 
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TaBLe VIII.—Srupies or ContTInvaATION ScHoot PoPpuLaTION 




















Investigation Median IQ; Ages | Number tested Test used 
This etudy........... 85.5 14-18 1824 | Terman group 
Massachusetts!....... 14-15 1199 | Dearborn, series 2 

(4) Sere 85 309 
Me eek et eds ets. 84 890 
Wisconsin (twotowns)*; 80 14-18 190 | Terman group 
Honolulu®............ 13-18 Terman group 
ee ee 73-83 and national 
SS 77-87 
Pennsylvania‘........ 14-15 1318 | Haggerty, delta 2 
RL s Sox's > 6 awe 91 638 
i.) re 88 680 
Denver opportunity®. . 86 175 | Dearborn, series 2 
New York City®...... 70 15-16 768 | Otis intelligence 








1 Hopkins, T. L.: The Intelligence of the Continuation School Children of 
Massachusetts. Harvard Studies in Education, No. 5, 1924. 

2 Plenzke, O. M.: A Study of the Mental Abilities of Continuation School 
Pupils. Journal of Educational Psychology, Vol. X, June, 1924, p. 42. 

* Carpenter, J. E.: Functions of Mental Tests in the Administration and 
Organization of a Vocational School. Vocational Education Magazine, Vol. II. 
September, 1923, p. 1. 


* Reedy, Caroline M.: Can Intelligence Tests Help Solve the Continuation 
School Classification Problem? Vocational Education Magazine, Vol. I, p. 2. 

§’ Taken from Hopkins: loc. cit. 

* Clark, Ruth Swan: ‘‘The Continuation School Survey.” Jan. 8, 1921. See 


also School Courses from a Psychological Point of View. National Vocational 
Guidance Association Bull. No. 1, May, 1923, p. 171. 


following quotation from the publication referred to in reference 6 in 
table: 

‘Since that study was made, all children under 17 who are not 
high school graduates have been brought into the continuation 
school, so that if the test were given now the average would doubtless 
be higher, as the group would include many Grade VIII graduates 
and many children with some high school training.” 

Below are given the results of investigations of the relative intel- 
lectual capacities of high school and continuation pupils. The first 
three studies referred to are designated as in Table VIII, being the 
same studies as used there. 

In each case there is considerable difference in favor of the high 
school groups. In New York City comparison is made with elemen- 
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TaBLE IX.—INTELLIGENCE OF CONTINUATION AND Hicu ScHoou Pupiis 

















Investigation Continuation IQ | Number} High schoolIQ | Number 

ES ee nr > 1199 | 102 (median) 1980 

TS | Say ee 85 (median) 

MP MR a wéacnae 84 (median) 
New York City.... | 68-85 (middle half) 768 | 90-115 elementary 

school 

Denver opportunity| 86 (median) 175 | 95 Grade IX 
Connecticut!....... 41 (median score) 421 | 90 Grade IX 910 
This study 

School A........ 87.3 (median) 359 | 102 Grade IX 248 








1 The results of the Connecticut study are reproduced from Hopkins: ‘The 
Intelligence of the Continuation School Children of Massachusetts.” 


tary school, and the contrast is so great that there is no overlapping in 
the ranges of the middle halves. The elementary group graduates 
will undoubtedly divide up in their attendance at secondary schools, 
some going to the continuation schools and others to the various kinds 
of schools that are open to Grade VIII graduates. It would appear 
that the continuation school is apt to get only pupils of this group who 
fall in the lower quartile of the range of intelligence. In most cases 
comparisons are made with high school freshmen rather than with 
high school pupils as a group. With entire high school populations 
the differences would be greater than those shown above. 

The superiority of another freshmen high school group is evident in 
the following figures giving the results of the Otis Self-administering 
Test (Intermediate A) given in the fall of 1923 to 12,652 freshmen 
entering the high schools of Chicago. The chronological ages ranged 
from 11 to 18 years with a median of 14 years, 5 months. The median 
mental age was 14 years, 4 months. 























| Mental 
School Number} Ql ages, Q3 Median IQ 
median 
Chicago high school freshmen | 12,652 | 12-7 14-4 15-6 99.4 
Wisconsin continuation....... 359 | 12-5.8 | 13-6.5 | 14-7.7 85.5 





Perhaps the most significant comparison that can be made between 
continuation and high school students is by reference to the published 
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norms for the Terman Group Test. In the two following tables com- 
parisons are made on a basis of both age and grade norms with Wis- 
consin continuation students. The grade norms are taken from the 
Manual of Directions accompanying the Terman Test. The age 
norms are from Intelligence Testing by Pintner. 


TaBLE X.—COMPARISON BETWEEN CONTINUATION AND Hicu Scuoot Pvupiis 
Basep on AGE Norms ror TeERMAN Group TEST 


Scores, Scores, 
AGE Norms CoNnTINUATION 
14 103 72.3 
15 126 74.1 
16 146 
By 167 
18 188 $3.6 
19 210 
Totals 81.3 


TaBLE XI.—CoOMPARISON BETWEEN CONTINUATION AND Hicu Scuoout Pupiis 
BaseD ON GRADE NorMsS FOR TERMAN TEST 


PERCENTAGE Scores, Scores, 
ANK Norms CONTINUATION 
10 151 131 
25 128 106 
50 104 81 
75 81 60 
90 63 39 


CONCLUSIONS AND IMPLICATIONS 
CONCLUSIONS 


Intellectual Capacity of Continuation as Compared with High School 
Pupils.—The first problem of this study as stated in the introduction 
has to do with the comparative intelligence of continuation and high 
school pupils. Comparisons have been made with high school pupils 
of equal or lower ages in seven cases (see Tables IX, X, and XI). All 
the evidence goes to show the inferiority of the continuation pupils in 
intelligence as compared with high school pupils as measured by several 
different tests of intelligence. The differences are too great to be 
accounted for by chance error in sampling, in testing, or even by 
inaccuracies of measurement. Considering the fairly large numbers 
tested (included in this and similar investigations here considered), 
the geographical distribution of the pupils, the use of different tests, 
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and the work of several testers, the results show a rather remarkable 
agreement. The first question of the problem, then, is answered in 
no uncertain terms. 

Intelligence Level of Type of Material.—This is the second problem 
involved in the investigation. The results of seven studies (see Tables 
V, VI, VII, and VIII) are presented to indicate the general intellectual 
level of the continuation or part-time pupils. Continuation pupils 
as a group are distinguished by low intellectual capacity as measured by 
present abstract intelligence tests. This is shown in: 

1. Their retardation. 

2. Their general intellectual level as compared with other pupils 
of their ages. 

3. Their elimination from the full-time school. 

4. Their achievement in school subjects (see Table V). 

Possibly tests of other kinds of intelligence: Of will, emotional, moral, 
and esthetic qualities; and of health and physical soundness, would 
show quite different results. 

Evidently the continuation school, whether it be in Wisconsin or in 
any other state where compulsory continuation education is well under 
way, has a distinct and unique population with which to deal. 


IMPLICATIONS 


1. The existence of the continuation school as a distinct kind of school 
is fully justified. Far from proving that formal education is unprofit- 
able for these boys and girls, the results rather emphasized the neces- 
sity for certain kinds of training in order to assist them to become 
useful, and happy members of society. 

2. The results of this study do not tend to discount the chances for 
success in life of continuation school pupils. Realizing that the test 
used in this investigation is not a perfect measure of general intelli- 
gence, and that it probably overweights the abstract as distinguished 
from mechanical and social aspects of intelligence, a quotation seems 
very pertinent at this point. ‘Instead of trying to turn the eyes of 
all their pupils toward academic work or the learned professions as 
the ultimate goal in life, teachers must recognize and teach that effec- 
tive service to the community is the common ideal. It must be clear 
to them that it is just as noble and worthy to be an efficient butcher or 
teamster as to be a good teacher or, indeed, that an effective street- 
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sweeper may actually be worthy of more public honor and respect 
than a successful lawyer.’ 

3. The great need and opportunity for vocational guidance with 
continuation pupils is shown. Here is a group of young people who 
are not simply looking forward to entrance upon a vocation in a few 
years when the period of their schooling is over, as is usually the case 
with high school students, but a group who are already at work, some- 
times through economic pressure, and who may be forced to a choice 
of vocation without careful deliberation and without consideration of 
many things that should be considered. 

4. Instruction must be very simple and concrete; and learning should 
be largely through seeing, hearing, and handling of materials. Instruc- 
tion in the school for part-time pupils is quite liable to be beyond the 
capacity of the pupils. This fact is recognized in the Report of the 
Wisconsin Commission on Industrial and Agricultural Training, in 
1911, which recommended the establishment of continuation schools. 

5. Special care should be taken to adapt objectives to the capacity 
of the pupils and to make them definite and clearly defined. 

6. A major aim of the continuation school should not be to restore its 
boys and girls to the full-time school. It has been shown that continua- 
tion schools have a distinct and unique population with which to deal— 
a population that cannot be well cared for by the high school as it 
exists at present. It is not denied that there are some continuation 
pupils who could successfully carry high school subjects or who might 
with profit be transferred to the high school. 





1 Trabue, M. R.: Some Pitfalls in the Administrative Use of Intelligence Tests. 
Journal of Educational Research, Vol. VI, 1922. 














MEASURING CONSISTENCY 


CARLETON WASHBURNE 


Superintendent of Schools, Winnetka, Illinois 


When consistency of performance is to be measured, the coefficient 
of correlation cannot always be used. One sometimes needs to know 
how consistent one part of a test is with another, how consistent a child 
is with himself in two attempts at a given performance, how consistent 
a test is with the validating criterion, etc. When each element is 
either definitely right or wrong and the scores for the various elements, 
therefore, are either 100 per cent or 0, the coefficient of correlation is 
not a satisfactory measure of consistency. 

The particular necessity which drove us to seek a formula to 
measure consistency was an attempt to validate certain tests in formal 
language. In these tests children were expected to supply missing 
elements of capitalization and punctuation. 

We wished to find out: 


(a) Whether a child who failed to capitalize a person’s name, for example, in 
one part of the test was consistent in his performance in this particular regard in 
the other parts of the test. 

(6) Whether a child who failed to capitalize a person’s name, for example, 
in one part of his composition, was consistent in this regard in the rest of his 
composition. 

(c) Whether a child’s performance on his test in any particular regard was 
consistent with his performance in composition in this same regard. 


To illustrate further, child A capitalized names of persons five 
times in his test and failed to capitalize them three times. In his 
composition he capitalized them four times and failed to capitalize 
them twice. How consistent was he in his test? How consistent 
was he in his composition? What was the consistency between his 
test and composition? How did his consistency in these three regards, 
for the capitalization of the name of a person, compare with his con- 
sistency in the use of a period at the end of a sentence, or any other 
given element? 

The notion of consistency may be expressed quantitatively in per- 
centage terms. If a child has six opportunities to get an element right 
or wrong and gets all of them right he may be said to be 100 per cent 
consistent. If he gets this element wrong three times and right three 


times, he is as inconsistent as he can be and may, therefore, be con- 
612 
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sidered as having a consistency of zero. If he is right four times and 
wrong twice, his consistency will be 33 per cent. This is evident in 
following diagram: 

+r + + 

+ om — 


To agree with himself, the child must make at least two attempts. We 
may therefore divide his total of six attempts into three pairs. In one 
of these pairs he is consistent; in the other two pairs he is not. He is, 
therefore, consistent one-third of the time and his consistency may be 
stated as 33 per cent. 

If the child is right five times out of six his consistency is 67 per cent 
as will be seen from the following diagram: 


+ + + 
+ + - 


In two-thirds of the possible opportunities for consistency, the child 
has been consistent; the remaining third he has been inconsistent. His 
consistency is, therefore, 67 per cent. 

Suppose a child has used an element correctly four times out of 
five; his consistency may be found by doubling both numbers and con- 
sidering four out of five as the equivalent of eight out of ten. His con- 
sistency will be found to be 60 per cent as shown in the following 
diagram: 

T+ + + + 
PS eS 


In three-fifths of the cases he has been consistent with himself and in 
two-fifths of the cases he has been inconsistent. 


The formula derived from this reasoning is: 


C= == é, where C is the coefficient of consistency, R is the 





number of rights, and A is the number of attempts. Thus, if a child 
has four rights out of five attempts: 


c= CKD =? oF? = 60 per cent (1) 





This will yield a negative number wherever the child was wrong 
more often then right. As far as his consistency is concerned, the sign 
may be dropped as of no significance. . 

Frequently it is more convenient, especially in the light of the for- 
mula about to be derived for measuring interconsistency between two 
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not entirely consistent variables, to state the number of rights out of 
attempts in terms of percentage. Thus, if a child gets four rights out 
of five attempts, his per cent of accuracy is 80. This may be used in 
the following formula: 


C = 2R, — 100, where R, is the per cent of accuracy. 
Substituting in the case cited 
= (2 X 80) — 100 or 60 per cent. 


Now, let us suppose that we want to find the interconsistency 
between two different measures. Suppose, for example, that in a lan- 
guage test a child has succeeded in capitalizing the names of persons 
four times out of five, but that in his composition he has capitalized 
the names of persons three times out of four. What is the consistency 
between his performance on his test and his performance on his 
composition? 

To solve ‘oht << eamamanannd it is necessary to change both the 





fractions (—* —-) to a common denominator; in this case, 20. The 


ae 
child may be considered, therefore, as having 16 rights out of 20 
attempts in his test and 15 rights out of 20 attempts in his composition. 
(See diagram. ) 


++t+4+++4+4++4+4+4+4+4+4+4+---- 
+++++4+4+4+4+4+4+4+4+4+4+----- 


It will be seen from the diagram that in only 1 chance out of 20 has the 
child been inconsistent as between test and composition. His inter- 
consistency, therefore, is 95 per cent. 

Interconsistency, of course, applies to both positive and negative 
elements. A child who gets three elements out of five right in a test, 
and three out of five right in a composition is, obviously, much more 
consistent as between test and composition than the child who gets 
three out of five right in the test and all five right in the composition, 
even though the composition is more consistent within itself in the 
latter case than in the former. Interconsistency has nothing to do 
with rightness or wrongness per se. A child whose consistency in a 
test was zero (i.e., rights and wrongs were equal), and whose consist- 
ency in composition was also zero, would have an interconsistency of 
100 per cent—perfect—just as would a child whose consistency in test 
was 100 per cent and in composition 100 per cent. In both cases the 
child’s performance was the same in both test and composition. 
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rights 
attempts 
each of the two measures to be compared be stated fractionally and 
that the fractions be changed to fractions with a common denominator. 
The formula then becomes 


_ Ri +(A — Re) 
Cy = TS (3) 


The formula for interconsistency requires that the for 





where RF, and R, are the numerators of the fractions with the common 
denominator A, RF, indicating the number of rights (after the fraction 
has been changed to a common denominator) of one measure, R, that 
of the other, and where R; is the smaller of the two values R; and Re. 
In the above cited examples since the child got four rights out of five 
in his test and three rights out of four in his composition, the two frac- 
tions become 169 and 1549 when changed to a common denominator. 
R, is 15 (the smaller number is always used for R, in order to avoid an 
improper fraction) and R; is 16. And A is 20. Substituting in the 
formula: 


_ 15+ (20-16) 19 
Ci. = 20 “—* 95 per cent 
It is usually easier to handle all cases in terms of a common denomi- 
rights . t 
attempts “— 
per cent of rights. When this is done the formula becomes 


Cie = Ro a Rye a 100 (4) 


Substituting in the example cited above, the child who got four 
rights out of five attempts in his test was 80 per cent right. Since 
he got three rights out of four in his composition, he was 75 per cent 
right. R,: is always the smaller of the two percentages. Substi- 
tuting in the formula we get 





nator of 100, in other words to change all statements of 


Cie = 75 — 80 + 100 = 95 per cent 


Through the use of formula (1) or (2) it is possible to determine 
quantitatively the consistency within any given measure. The 
consistency, for example, with which a person gets the correct answer 
to any particular recurring type of arithmetic example, the consistency 
with which he correctly (or incorrectly) uses any element of punc- 
tuation, capitalization or grammatical usage—wherever a single 
element recurs repeatedly in a test or other performance, the con- 
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sistency of a person’s behavior toward this element is determinable 
through formula (1) or (2). 

Similarly, when there are two different sets of opportunities for the 
recurring use of a given element, such as addition facts in an addition 
fact test and these same facts in a problem test, or such as language 
elements or spelling words in a formal test and these same elements or 
words in a composition, it is possible through formula (3) or (4) to 
determine quantitatively the degree of interconsistency that exists 
between a person’s behavior in regard to the elements measured under 
two different situations. 





NOTE ON THE CALCULATION OF PERCENTILE 
RANKS 


L. L. THURSTONE 


University of Chicago 


In the tabulation of test data from many colleges on the psycho- 
logical examination of the American Council on Education I have been 
using a short cut in the calculation of percentile ranks which gives, 
however, the correct percentiles. It is not an approximation pro- 
cedure. This short cut is so simple that the column of percentiles for 
the accompanying frequency distribution of 5198 cases can be calcu- 
lated correctly in a few minutes and it does not necessitate either plot- 
ting or interpolation. As far as I am aware this method has not been 
described in the textbooks on statistics. 

Before describing the method, a rather frequently occurring error 
will be discussed which is avoided by the present short cut. In order 
to make the error conspicuous let us consider, as an example, a popula- 
tion of only ten cases. Let the ten individuals have different scores; 
no two of them have the same score. When the ten individuals have 
been arranged in rank order from lowest to highest score, what per- 
centiles should be assigned to them? Of course, in practice one would 
not bother to calculate percentiles for so small a population. It is 
used here as an extreme example to explain a point. The first impulse 
might be to assign the ten individuals percentile ranks of .10, .20, .30, 

. 1.00 but that would be incorrect. The true percentile ranks 
are .05, .15, .25, .35, . . . .95. Another rather common but erro- 
neous definition, when taken literally, is that the percentile rank of a 
person is the proportion of the whole group that has scores lower than 
he has. According to this definition the ten percentiles in the above 
example would be .00, .10, .20, .30, . . . .90 which is also incorrect. 

The true percentile rank must be so defined that the percentile 
curve is the integral of the corresponding frequency distribution and 
with such a definition percentiles will be assigned on the same basis for 
large and for small groups, for large and for small class intervals, for 
large and for small class frequencies. According to this correct 
interpretation, every member of a group of n individuals is assigned 
1/n of the entire percentile range from .00 to 1.00. The percentile 
rank assigned to each person is the midpercentile of the percentile range 
that he occupies. The same reasoning is applied to the class intervals. 
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Every class interval has a range, a piece of the percentile range from 
.00 to 1.00, and the percentile assigned to each person in that class 
interval is the mid-percentile of the percentile range of the class interval. 

This is again clearer by considering an extreme example. Suppose 
that 20 per cent of the group make perfect scores. What percentile 
should be assigned to a perfect score? The perfect score has a per- 
centile of .90 since that is the mid-percentile of the percentile range 
.80-1.00, which is covered by the perfect scores. Similarly, if 16 
per cent of the group have zero scores, every zero score gets a percentile 
of .08. To assign the zero score a percentile of .00 is to adopt the lower 
edge of its percentile range, whereas to assign a perfect score a per- 
centile of 1.00 is to adopt the upper edge of its percentile range. This 
would be inconsistent. 

In the accompanying frequency table, the percentile rank of a 
score of 8, for example, can be considered as the proportion of true scores 
below 8, provided that half of the people with a score of 8 be considered 
to have true scores below 8 while half of them have true scores above 8. 

Let the first two columns of the accompanying table constitute the 
given frequency table. Add the f-column which gives the number of 
cases; n = 5198. Calculate the value of half the rate which is 


Set this value on the keyboard of the calculating machine and keep it 
there untouched throughout the rest of the calculations. 

Multiply this value on the keyboard by the first frequency which is 
50. The answer is .005 which is the percentile rank that should be 
assigned to an obtained score of 0. Clear the upper dial and multiply 
again by 50. The cumulative result is .010 which is the percentile 
rank of a true score of 1.00. It is not used because every actual score 
of 1 is treated as though it were a true score of 1.5. It may or may not 
be recorded. 

Clear the upper dial and multiply by the next frequency, 151. The 
result is .024 which is the percentile assigned to an obtained score of 1. 
Clear the upper dial and repeat. The result is .039 which is the per- 
centile of a true score of 2 but it is not used because every actual score 
of 2 is given the midpercentile for the group of scores of 2. 

Clear the upper dial and multiply by the next frequency, 250. 
The result is .063 which is the percentile rank of every obtained score 
of 2. Clear the upper dial and repeat. The result is .087. 

















Calculation of Percentile 619 


Continue likewise for all the other frequencies, multiplying each y 
one twice asshown. The last value should be 1.000 which constitutes 
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a check on the arithmetical work. iV 

4 

TaBLe I ) 5 

ny 

pina Class frequency Percentile rank Check column wie 
z f PRm PR, i 

aa 

0 50 .005 .010 hat 

1 151 .024 .039 a 

2 250 .063 .087 a 

3 371 .122 .158 a 

4 504 .207 255 i 
5 514 .305 .354 by 
6 612 .413 .472 ‘e) 

7 526 .522 .573 H 
s 532 624 .675 Al 
9 442 .718 .760 wl 

10 348 794 827 if 
11 256 852 .876 Meh 

12 195 895 914 i 

13 150 .928 .943 | 
14 110 .953 .964 iH! 
15 77 971 .979 yal 

16 .983 .987 ts 

17 32 .991 .994 
18 20 .996 .997 Fi | 

19 9 .998 .999 
20 4 .9996 1.000 
n = §.198. 
Rate = ~ = 0,00019238. 
~ = 0.00009619. 
nr 


In preparing a percentile table for actual use, only the first and 
third columns are included. The fourth column should always be 
omitted because its only purpose is to serve as a check in calculation. 
By means of it one will always know whether a multiplication just 
completed is the first or the second of the double multiplication for 


each frequency and its final value of 1.000 proves the arithmetical ~ 
accuracy of the whole table. 
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The present method is especially useful for class intervals of unity 
and generally for frequency tables in which the class interval is equal 
to the scoring unit. It is also correct for class intervals of any magni- 
tude if it is desired to assign a percentile to each class interval without 
interpolation. 

When a percentile curve is plotted for purposes of interpolation an 
error is introduced by the methods described in the current text books. 
If the curve is to be used for interpolation it should be moved down 
half a score unit on the score axis but for distributions with a large 
number of class intervals and with five or ten scoring units in each 
class interval this error is negligible. It is avoided entirely, however, 
in the accompanying example. 

The short cut here described is so simple that a clerk can learn it in 
a few minutes. I doubt whether a correct percentile table can be 
constructed so rapidly and simply by any of the usual methods of 
calculation which involve plotting and interpolation. The Otis 
percentile graph, for the example here described, would require inter- 
polation for the correct percentiles since the accumulative percentages 
would not give directly the correct percentiles. 
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THE RELIABILITY OF THE STANFORD-BINET SCALE 
AND THE CONSTANCY OF INTELLIGENCE 
QUOTIENTS 


EDWARD A. LINCOLN 


Harvard University 


In a previous discussion of the constancy of intelligence quotients 
obtained by the use of the Stanford Revision, the present writer 
pointed out that we cannot come to a conclusion on this subject until 
we know how much or how little the intelligence quotients of a group of 
children are likely to vary if the examinations are repeated immediately 
or on successive days.' ‘The same point is also mentioned by Freeman 
in his recent study of mental tests,? and has undoubtedly been con- 
sidered by many other workers who use the Stanford or other scales. 
It is the purpose of this article to report an experiment which throws 
some light upon this question. 

The subjects for the experiment were children in a small city school 
who were between six and seven years old, chronologically. It was 
considered advisable to use children of the same chronological age in 
order to avoid the questionable procedure of partial correlation tech- 
nique in getting rid of the age factor, and also because of the possibility 
that the reliability of the scale may be different at different ages. 
Thirty-two cases were found in the first, second and third grades, but 
two of these were absent at the time of the re-examination, so the 
experimental group consisted of 30 cases. There were a few six-year- 
old children in the kindergarten, but they could not be examined with 
the others since they did not return to school in the afternoon. The 
tested group is, as a whole, slightly under average, since the median 
(counted) intelligence quotient earned on the first examination was 96, 
and changed to 99 on the second examination. 

The first examinations were given in the morning and the second 
examinations in the afternoon. The noon recess is an hour in length, 
and the interval between examination was three and one-half to four 
hours. A definite formula for the introduction to the second was 
adopted, as follows, ‘‘We had such a good time this morning that I 
thought you would like to come to see me again this afternoon.” No 
reference whatever was made to the quality of the child’s previous 
performance. 





1Lincoln, E. A.: The Constancy of Intelligence Quotients. Journal Edu- 
cational Psychology, Vol. XIII, No. 8, p. 484. 
2 Freeman, F. N.: “ Mental Tests.” Houghton Mifflin Co., 1926, p. 344. 
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All the examinations were administered by the writer himself, and 
a definite order of presenting the first few test items was adhered to 
rather closely. This order was as follows: 
Weight discrimination, V, 1. 
Counting 13, VI, 3. 
Missing parts, VI, 2. 
Right and left, VI, 1. 
Comprehension, VI, 4; IV, 5; VIII, 3. 
Coins, VI, 5. 
Memory for sentences, ITI, 6; IV, alt; VI, 6. 
Fingers, VII, 1. 
Picture description, VII, 2. 
From this point the procedure varied somewhat according to the 
degree of success on the above tests. In general, it is the practice of 
the writer to work up the scale until the upper limit is reached, and then 
work down to the basal year. It probably is better for the child to 
leave him with the memory of a series of successes rather than of a 
series of failures. 


Distributions of the changes in mental age and intelligence quotient 
are shown in Tables I and II. 


SPN rrr> 


Taste I].—CHANGES IN MENTAL AGE 


NS Sao ee ee 0 1 2 3 4 5 6 Median 
ns ees ore a aia ane eae 0 Ss 0 3 1 3 2.9 
CE Wilk eae De Se are eae 1 4 0 1 0 0 2.5 
Psd ooh bok ckbw Wien ee 9 : BB 0 4 1 3 2.4 
TaBLeE II.—CuHances 1n IQ 
NS Oo aa s baa 0 1 2 3 4 5 6 7 8 Median 
RE ere eee 2 6 0 0 2 1 1 3 3.9 
0 Se ee ee 1 2 2 0 1 0 0 0 3.0 
ait tein ne a al 9 3 8 2 0 3 1 1 3 3.4 


The mental age changes range from 0 to 6 months, with a median 
of 2.4months. There are 15 cases of gain and 6 of loss, with no change 
in 9 cases. The gains are not only more numerous than the losses, but 
are somewhat larger, on the average, although the modal change is the 
same in each case. | 

The changes in IQ vary from 0 to 8 points, with a median of 3.4. 
The gains are more numerous than the losses, and average nearly a full 
point larger. Two-thirds of the changes are two points or less. On 
the other hand, there were three cases—10 per cent of the total—in 
which the gain was 8 points. 





1Lincoln, E. A.: Time Saving in the Stanford-Binet Test. Journal Edu- 
cational Psychology, Vol. XIII, No. 2, Feb., 1922. 
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In Table III are set down the means and standard deviations of the 
distributions of changes, together with the values of 3.5 and 5 times 
the standard deviations. 


Tas_e III.—Megans anp STANDARD Deviations oF THE MENTAL AGE AND IQ 


CHANGES 
Mean e 3.5¢ 50 
Ee Sa oh nn wt hae a iia ed ais 2.13 1.91 6.69 9.55 
EN dn cine n'y ada are ee mien AME 2.57 2.63 9.21 13.15 


The mean change in mental age was 2.13 months with a standard 
deviation of 1.91 months. The mean IQ change was 2.57 points with 
a standard deviation of 2.63 points. The other figures are significant 
in that they show the possible changes if a greater number of cases had 
been obtained. Thus, in a distribution which reached out to 3.50 on 
either side of the mean, we could expect to find a gain of about twelve 
and a loss of nearly seven points in IQ. If the number of cases were 
increased until the range became 5e, there would appear a gain of nearly 
16 points, and a loss of 11 or 12. 

An interesting aspect of the changes is presented in Table IV which 
shows the percentage of subjects who were consistent in their perform- 
ances on the separate tests. 


TaBLE IV.—ConsTANcy OF PERFORMANCE IN THE SEPARATE TESTS OF THE 
SranFrorD Revision; Per Cent or Responses Wuicn Dip Nor CHANGE 








Year test IV V VI VII VIII Ix 

1 93.8 93.1 90.0 96.6 91.3 100.0 

2 93.8 96.5 73.3 90.0 91.3 92.3 

3 87.5 92.9 83.3 96.6 | 100.0 | 100.0 

4 100.0 100.0 86.6 86.6 100.0 92.9 

5 87.5 92.9 100.0 100.0 91.3 100.0 

6 100.0 86.2 86.6 93.3 | 100.0 | 100.0 

Number of cases....... 16 28-9 30 30 23 7-15 























Obviously there are four possibilities: (1) To pass the test both 
times, (2) to fail the test both times, (3) to pass in the morning and fail 
in the afternoon and, (4) to fail on the first trial and pass on the second. 
In the bow knot test, where half credits are allowed, there are further 
possibilities. Only those tests which at least ten pupils attempted are 
reported upon in the table, except in the case of IX, 6, for which there 
were only seven cases. 

It is surprising to note that in only a third of the cases, 12 out of 36 
tests, were the performances absolutely consistent. These tests are 
listed on following page. 
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IV, 4. Drawing a square VIII, 4. Similarities 
IV, 6. Memory for four digits VIII, 6. Vocabulary 
V, 4. Definitions in terms of use IX, 1. Date 
VI, 5. Naming four coins IX, 3. Making change 
VII, 5. Differences IX, 5. Making sentences 
VIII, 3. Comprehension IX, 6. Making rhymes 
The tests in which the greatest variation appeared were as follows: 
V, 6. Following directions VI, 4. Comprehension 
VI, 2. Missing parts VI, 6. Memory for sentences 
VI, 3. Counting 13 VII, 4. Bow knot 


The changes in the missing parts test were most numerous, and the 
performances remain inconsistent even when allowance is made for 
the fact that subjects who fail in the first item of the test are shown that 
the mouth is missing. 

Out of 60 inconsistent performances, 38 were failures in the morning 
followed by successes in the afternoon. The other 22 were cases of 
initial success followed by failure. The counting thirteen test was 
most remarkable in this respect, in that four children who succeeded 
in their first attempt failed in their second. 

A rather unexpected finding was that the superior children did not 
make the largest gains. There were three who earned IQ’s of 110 or 
more on the first test, of these one gained three points, one lost two 
points, and one lost five points. Of the six whose initial IQ’s were 85 
or under, one remained constant, one lost one point, one gained three 
points, one gained five points, one gained six points, and one gained 
eight points. 

A number of correlation studies were made for the purpose of deter- 
mining reliability coefficients. The calculations were made by the 
Spearman Rank Difference Squared method, with the use of the Scott 
Laboratory tables, and the results were translated into terms of r.! 
These coefficients are recorded in Table V, together with their probable 
' errors, and the probable errors of measurement, calculated from the 


formula, 0.67452 Hy 1 —r. In the cases where the examination 


was split, the coefficients have been stepped up by the Spearman- 
Brown formula. 

The reliabilities of the separate morning and afternoon examina- 
tions were figured in two ways: First, by splitting the tests after the 





1 For the calculations of the correlations and probable errors of measurement, 
the writer is indebted to Miss Pauline D. Dodge of the mathematics department 
of the Somerville, Massachusetts, high school. 





or 


st 


Pe 


Pe 








Reliability of Stanford-Binet Scale 625 


TaBLE V.—RELIABILITY COEFFICIENTS AND PROBABLE ERRORS OF MEASUREMENT 








r PE, 
First and second examinations............ MA .95 + .012 1.3 months 
IQ .95 + .012 1.8 points 
IIR oc ccue eee shee cdecece MA .81 + .043 2.9 months 
First three tests vs. second three........ IQ .77 + .051 4.2 points 
Second examination......................| MA .73 + .057 3.4 months 
First three tests vs. second three........ IQ .70 + .063 4.6 points 
First examination........................| MA .88 + .027 2.3 months 
Pines WU, GUNN WOUND... ccc c ec ccceccecs IQ .86 + .033 3.2 points 
Second examination...................0.. MA .82 + .040 2.7 months 
CGE WE CVG CORES... ec cc cee wees IQ .80 + .044 3.8 points 











method of Otis! and comparing the results on the combination of the 
first three tests in each year with the results on the combination of the 
last three tests; second, by combining the odd numbered tests to get 
one mental age and intelligence quotient, and the even numbered tests 
to get the second set of results. This first method has been used a 
number of times in studying this problem, but the second has not pre- 
viously come to the writer’s attention. It will be noted that the 
reliability by the odd-even method is slightly greater than that 
obtained by the Otis method of dividing the tests. 

The reliability coefficients of .95 for both mental age and intelli- 
gence quotient indicate a high reliability of the Stanford Scale when 
tested under the conditions of this experiment. The small probable 
errors of measurement substantiate this finding. It appears that in a 
large group half the IQ changes may be expected to be under two 
points. These results may be compared to the findings of Terman 
and Cuneo? who re-tested 25 children after an interval of 48 hours, and 
obtained a reliability coefficient of .95, together with a median change of 
three points in IQ. 

Thus it appears that the reliability of the Stanford-Binet scale is 
highly satisfactory, at least for six-year-old children. 

However, for the student interested in the investigation of con- 
stancy of intelligence quotients, this reliability is no more important 





1 Otis, A. S., and Knollin, H. E.: The Reliability of the Binet Scale and of 
Pedagogical Scales. Journal Educational Research, Vol. IV, Sept., 1921, p. 121. 


? Terman and Cuneo: IQ Tests of 112 Kindergarten Children and 77 Retests. 
Pedagogical Seminary, Vol. XXV, 1925, p. 415. 
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than the fact that in 10 per cent of the cases there was a change of 
eight points in IQ when a second examination was given within four 
hours from the time of the first. Shall we be surprised, then, if we 
find that after one or more years a fifth or a quarter of the IQ’s have 
changed ten or more points? 

The small number of cases used in this experiment casts some 
shadow of doubt over the validity of the results. Also, it cannot be 
stated with any degree of certainty that similar findings would appear 
in other age groups. This study is, then, more valuable as an example 
of technique than for the results which it presents. Similar studies 
need to be made with larger groups of children at all ages, for knowl- 
edge of the reliability of the Stanford Scale and other scales is highly 
desirable, especially as such knowledge bears on the constancy of the 
intelligence quotient and other important related problems. 

Summary.—1. This article reports the results of an experiment in 
which 30 six-year-old children were given two Stanford-Binet exami- 
nations on the same day. 

2. The range of changes in mental age was from 0 to 6 months, 
with a median of 2.4 months and a mean of 2.13 months. 

3. The range of changes in IQ was from 0 to 8 points, with a median 
of 3.4 points, and a mean of 2.57 points. 

4. In both MA and IQ the gains were more numerous and averaged 
larger than the losses. 

5. Consistency of performance varies considerably in the separate 
tests. In only 12 out of 36 tests was there absolute consistency. 

6. A study of inconsistent performances shows that in nearly 37 
per cent of these a test was failed on the second examination which had 
been passed on the first. 

7. Children with I1Q’s under 85 on the first test gained substan- 
tially, while two out of three with I1Q’s over 110 lost on the second 
examination. 

8. The reliability coefficient in the case of both mental age and 
IQ was .95. 

9. The probable error of measurement was 1.3 months for mental 
age and 1.8 points for IQ. 

10. In spite of the high degree of reliability, it is a very noticeable 
and significant fact that 10 per cent of the cases varied eight points in 
1Q within four hours. 

11. This study needs to be supplemented by similar studies for 
children at other ages. 


dea 
just 
Ins 
the 


It 1 


of > 
of t 
(X : 
bet 
(d = 
(See 
of t 
succ 
of tl 
rep! 
an - 
recu 
may 
enti 
chai 
frac 


easil 
sign 


com 


the | 
subt 








NOTE ON THE COMPUTATION OF THE RANK- 
DIFFERENCE CORRELATION COEFFICIENT 


EDWARD E. CURETON 


Stanford University 


Rank correlation methods are most commonly employed when 
dealing with small numbers of cases, where the sampling errors do not 
justify the added labor of the more exact product-moment methods. 
In such cases, it is justifiable to make use of certain approximations in 
the computation. 





Se RECS eae 62d? 
Spearman’s formula is ordinarily written Rho = 1 — NW?—1) 
It may be written Rhe = 1 — 2a" sw | In the computation of 


of Xd, a table is prepared containing in successive columns the names 
of the subjects, their scores in each of the two variables measured 
(X and Y), their ranks in these variables (K, and K,), the differences 
between their ranks in the two variables taken irrespective of sign 
(d = K, — K,or K, — K,), and the squares of these differences. 
(See Table I.) When two or more subjects have the same score in one 
of the variables, they are all given the same rank: the average of the 
successive ranks next in order corresponding to the several repetitions 
of the score. It should be noted that this average rank will always be 
represented either by an integer or by a mixed number consisting of 
an integral part and the fraction .5, according as the number of 
recurrences of the score is odd or even. Hence the differences in rank 
may vary in steps of .5 from 0 to N—1. If the fractions are omitted 
entirely in the d? column, the Rho will not ordinarily be significantly 
changed. Table II gives squares from .5 to 49 by halves, omitting 
fractions. 


In evaluating toy’ since the only variable is N, this factor is 


easily tabulated. Table III gives values of this expression to three 
significant figures, for values of N from 10 to 50. 
Having evaluated these factors, the solution of the formula is 


> 4 Md 6 ‘ ”? 
comparatively simple. We multiply 2d? by Ne=n’ “round off 


the product to three significant figures, subtract this product from 1 by 
subtracting the right-hand digit from 10 and each of the others from 9, 
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and “round off” the answer to two figures. This answer is Rho, the 
rank-difference correlation coefficient. (See Table I.) 


TaBLeE I.—ILLUSTRATING THE COMPUTATION OF SPEARMAN’S Rho 
































Subject xX Y K; Ky d d? 

1 5 3 7.5 9.5 2 4 

2 6 -* 9.5 3.5 6 36 

3 3 4 3 11 8 64 

4 7 5 11.5 12 5 0 

5 2 2 2 7 5 25 

6 5 1 7.5 3.5 4 16 

7 6 2 9.5 7 2.5 6 

8 9 3 14 9.5 4.5 20 

9 1 0 1 1 0 0 

10 4 2 5 7 2 4 

11 7 1 11.5 3.5 8 64 

12 4 1 5 3.5 1.5 2 

13 8 6 13 13 0 0 

14 10 7 15 14 1 1 

15 4 8 5 15 10 100 
N = 15. 2d? = 342 
From Table III, _ v= meh 
2394 

342 


.61218 = .612, approx. 
1 — .612 = .888 = .39, approx., = Rho. 
From Table IV, r = .41. 


If the distributions of the original scores are approximately nor- 
mal, we may estimate the value of r, the corresponding product- 


moment correlation, by the formula r= 2 sin | githo . If two-figure 


accuracy is all that is required, the simple set of values of this function 
given in Table IV will suffice. 

The tables given later will handle nearly all. problems involving 
not more than 50 cases with all the accuracy warranted by their 
sampling errors. 

A fact worthy of notice is that when data have been ranked, the 
computation of approximate medians, quartiles and percentiles is very 
simple. In order to avoid confusion, it is best to rank the lowest score 
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1, rather than the highest. The median is then the score corre- 


sponding to the = as : rank; the lower quartile, the score correspond- 
N+1 


4 rank; the upper quartile, the score corresponding to 





ing to the 


N+1 p , 
the 3 " rank; and any percentile, the score corresponding to the 








P(N + 1) rank, P being expressed as a decimal. We may then deter- 


mine the quartile deviation by the usual formula Q = “ote. A 


close approximation to the standard deviation is given, for distributions 


approximately normal, by the formula SD = —~ =, 


Tas_e II.—Squares By HALvEs FROM .5 To 49 
Fractional Values .25 Properly Following Alternate Entries in d? Columns Have 












































Been Omitted 
j i] 
d | @ | d d? d d? d d? 
| 

Ngee 0 | is | 169 | 25.5 650 | 38 1444 
1 | 1 | 13.5 182 | 26 676 | 38.5 | 1482 
ae 2 |.14 196 | 26.5 702 | 39 .1521 
2 | 4 | 14.5 210 | 27 729 | 39.5 | 1560 
2.5 | 6 | 15 225 | 27.5 756 | 40 1600 
3 | 9 | 15.5 240 | 28 784 | 40.5 | 1640 
3.5 12 | 16 256 | 28.5 812 | 41 1681 
4 16 | 16.5 272 | 29 841 | 41.5 | 1722 
4.5 20 | 17 289 | 29.5 870 | 42 1764 
5 25 | 17.5 | 306 | 30 900 | 42.5 | 1806 
5.5 | 30! 18 | 324 | 30.5 930 | 43 1849 
6 | 36 18.5 342 31 961 43.5 1892 
6.5 | 42 | 19 361 | 31.5 992 | 44 1936 
7 | 49 | 19.5 380 | 32 1024 | 44.5 | 1980 
76 | 68 | 2 400 | 32.5 | 1056 | 45 2025 
8 | 64 | 20.5 420 | 33 1089 | 45.5 | 2070 
86 | m2| 2 441 | 33.5 | 1122 | 46 2116 
9 | gi | 21.5 462 | 34 1156 | 46.5 | 2162 
95 | 90 | 22 484 | 34.5 | 1190 | 47 2209 
10 100 | 22.5 506 | 35 1225 | 47.5 | 2256 
10.5 110 | 23 529 | 35.5 | 1260 | 48 2304 
il 121 | 23.5 552 |. 36 1296 | 48.5 | 2352 
11.5 132 | 24 576 | 36.5 | 1332 | 49 2401 
12 144 | 24.5 600 | 37 1369 

12.5 156 | 25 625 37.5 1406 
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TaBLe III.—VauveEs or NTN For VALUES or N From 10 To 50 
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6 
N Ni—N 
10 .00606 
11 00455 
12 | .00350 
13 | .00275 
14 | .00220 
15 | .00179 
16 | .00147 
17 .00123 
18 .00103 
19 .000877 
20 .000752 
21 .000649 
22 | 000565 
23 | .000494 
24 | 000435 
25 | .000385 
26 | 000342 
27 | .000305 
28 | .000274 
29 | .000246 
30 | 000222 


N 











31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 


N'—N 





. 000202 
.000183 
.000167 
.000153 
.000140 
.000129 
.000119 
.000109 
.000101 
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HOW EMOTIONAL TRAITS PREDISPOSE TO COLLEGE 
FAILURE 


J. BATEMAN YOUNG 
Colgate University Psychological Laboratory, Hamilton, N. Y. 


Why some college students should win high scholastic honors while 
others drop out due to failure in their studies, is a question which has 
long perplexed educators and psychologists. Intelligence tests have 
given us some information concerning why certain pupils do not 
progress in secondary school subjects, but they have been found 
inadequate to predict success in college after preparatory courses have 
been completed. 

Preparatory courses are in themselves an intelligence test which 
eliminates many from the ranks of potential college students. While 
there is usually a moderate correlation between the intelligence scores 
of those who enter universities and their college grades, the relation is 
not close and failures cannot be predicted accurately. 

As a result, personality factors besides intelligence have been 
discussed as significant in scholastic work. Some berate the “speed 
of the age;”’ others blame home training; still others decry the change 
from the old-fashioned curriculum. Everything from the commer- 
cialization of textbooks to general debility of character is assailed by 
men who would appear to know. Those persons given to clear think- 
ing, however, recognize the importance of the problem and urge its 
serious study. Dean Hawkes of Columbia University, for instance, 
has emphasized the need of additional scientific means for judging 
students, especially since only 57 per cent of all freshmen graduate. 

Lack of mental ability is undoubtedly the reason why many 
students leave college, although some others of high intelligence break 
off their college work because of outside circumstances which are 
beyond the control or insight of the university. There was Henry F., 
who never liked college, anyway; it was ‘“‘not practical enough.” He 
was fairly successful during his freshman year, but left college to 
accept work on Wall street. Then there was John F., whose father 
died while he was in school. John’s mother could have supported him 
in school, but she wanted her only child home with her, and another 
promising college career was cut short. 

No tests can be expected to predict such changes in a student’s 
life. Even if they could be foreseen, they would be largely beyond 
control. However, tests which measure principally intelligence tell 
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only a part of the prophecy which may be worked out for the student. 
Other factors of mental equipment can and should be considered. 
Emotionai handicaps, seriousness of interests, and favorable personality 
traits are among a man’s attributes not recorded by intelligence tests. 

To discover the importance of some non-intelligence factors on 
involuntary withdrawals of college students, the Colgate University 
psychological laboratory has undertaken one of the first studies of 
measurement of emotional factors and the application of these meas- 
urements to the success and failure of undergraduates. 

Freshmen entering Colgate are given a standard intelligence test 
and the two Colgate Mental Hygiene Tests. One of these determines 
extroversion-introversion tendencies; the other records psychoneurotic 
traits. The best description of the personality factors involved in 
extroversion and introversion is that of Jung. 

Broadly speaking, an introvert is a thinker and an extrovert is a 
doer. 

The easy blusher, the man with a clever pen but an awkward 
tongue; with precise habits of dressing, eating and personal affairs; 
who is much affected by praise or censure; who is extremely con- 
scientious and tends to worry; and who has a preference for detailed 
work—has the introvert makeup. On the other hand, the person 
with a ready laugh, a nimble tongue, a happy disposition, an indiffer- 
ence to praise and a love for sports, has the elements of the extrovert. 

Psychoneurotic traits reveal emotional instabilities. The person 
who is suffering from such tendencies may be nauseated at the sight 
of blood, spend much time deciding what to do next, be troubled by 
the idea that people are watching him and, without any reason, run 
the whole gamut of moods. 

Ted D. is a case worth noting. He was a very brilliant writer, 
but was hampered by his psychoneurotic traits. He was extremely 
fond of books, but was as badly discomfited by parties and dances. 
He frequently imagined people were trying to read his mind. Useless 
thoughts continually interfered with his work, rising at times in the 
form of visions. Occasionally he felt a desire to set fire to something. 
He was gawky and unimpressive in appearance and because of these 
peculiarities was termed “wet.” In spite of his literary ability, he 
was a lonely, almost friendless figure on the campus. He became the 
campus joke and was openly treated as such. Finally, he got into 
difficulty with the student board of law enforcement and transferred 
to a larger university, where his peculiarities were lost in the crowds. 
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It is to determine these traits of introversion-extroversion and of 
psychoneuroticism that the Colgate tests have been devised. They 
afford a personality inventory, or rating scale, based on some 35,000 
hours of work. Other workers in the laboratory have discovered occu- 
pational differences in emotional makeup. Is it not probable that the 
occupation of “college student” requires a certain emotional consti- 
tution as well as a minimum of intelligence? 

There is a zero correlation between the extroversion-introversion 
test and that for psychoneurotic traits. Furthermore, both tests cor- 
relate zero with intelligence at the college level. Taking these two 
Colgate tests and the intelligence examination together, however, we 
have studied the scholastic records of the classes of 1928 and 1929, to 
discover possible causes other than intelligence for success or failure in 
college work. 


INTELLIGENCE Not a SIGNIFICANT FACTOR 


The first year failures of the Class of 1928 were distributed equally 
among men with high and low intelligence scores. In the sophomore 
year of this class, intelligence appeared even less significant in account- 
ing for failures. Seven students above the campus average left school 
because of low scholarship to every four below that average who 
dropped out. This is almost a two-to-one ratio, and is surprising in 
that it seems to indicate that high intelligence predisposes to academic 
failure. The experience of two years with this class reveals a ratio of 
four above the campus average in intelligence flunking out compared 
with three below that average. 

However, the freshman year of the Class of 1929 yields an entirely 
different ratio. Only one student out of every ten dismissed for poor 
scholarship was a member of the group with more than average intelli- 
gence. When the data for all three years of study is combined, the 
ratio becomes 13 above the campus average to 18 below that median, 
indicating a tendency for flunks to be below the average for the campus 
in intelligence score. The difference is not marked, however, only 
about three-fifths of the failures coming from the lower side of the 
scale. 


Tue Rowe or INTROVERSION-EXTROVERSION 


Dismissals for poor scholarship in the class of 1928 were divided 
equally among introverts and extroverts during the first year. The 
extroverts led during the sophomore year, with the ratio eight to three. 
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The class of 1929 duplicated the experience of the preceding class for 
their first year, extroverts and introverts failing in equal numbers. 
The combined records of the two classes yield a ratio of 18 extroverts 
to 13 introverts. This means that approximately three-fifths of those 
who failed were above the average in extroversion. It is the same 
ratio obtained with the intelligence scores. 

To discover how combinations of extroversion and introversion 
with high and low intelligence affected scholarship, the members of the 
two classes were divided into four groups as follows: (1) Those above 
average in intelligence and introvert; (2) those above average in intelli- 
gence and extrovert; (3) those below average intelligence and introvert; 
and (4) those below average intelligence and extrovert. 

This revealed the fact that the last three groups have exactly the 
same number of failures, while the first group—those high both in 
intelligence and in introversion—has less than one-half of their quota 
of failures. While a man may be introvert or he may be very intelli- 
gent, his chances for failure are normal unless he has both of these char- 
acteristics, in which case the chances are reduced one-half. 

Granting that extroversion or low intelligence predisposes to poor 
scholarship, as data and clinical observations seem to justify, we should 
infer that high intelligence would compensate somewhat for extrover- 
sion and introversion for low intelligence. This does not seem to be 
the case, however, since the last three groups all show a similar number 
of failures. 


Case Stupiges ILLUSTRATE 


George L., for example, is intelligent and extrovert. He was 
dismissed from college due to his low grades, but was able to return 
because of the excellent work he could do when willing to try. He has 
now transferred to a law school and is doing very well. He is success- 
ful because he is working at studies which appear practical to him. 
This seems to corroborate the work of Bear of Centre College, pub- 
lished in the December, 1926, issue of School and Society, in which he 
reported that students interested in some profession make the better 
grades. 

Ralph F. is both extrovert and unintelligent in comparison to the 
majority of the students. He has managed to stay in school, but only 
through unceasing hard work. He has to drive himself incessantly 
to keep his mind on his books long enough to master the daily lessons. 
Despite his determination, he frequently sits for long periods with his 
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book in front of him, but with his thoughts on the athletic field. He 
wants to be doing something active; he cannot find an outlet for his 
energy in reading and studying. 

Benjamin §S. represents the other extreme. He is both highly 
intelligent and an introvert. He has made an exceptional college 
record and is almost certain to make Phi Beta Kappa at the end of his 
Junior year. He has already won two prizes for scholarship, earning 
one each of the two years which he has completed. 


How Psycuonevrotic Traits Arrect CoLLEeGE Success 


There is a zero relationship at the college level between intelligence, 
introversion-extroversion and psychoneurotic traits. There is no law 
to their permutations that the Colgate Laboratory has been able to 
discover. 

From the armchair, one would expect the students who were 
psychoneurotic to be especially likely to fail in their college work. 
As a matter of fact, the exact opposite is revealed by our records. 
There are twice as many students with stable emotions leaving because 
of failure than are students with unstable emotional outlets. 

Students can be divided into groups on the basis of the combi- 
nation of psychoneurotic traits and introversion-extroversion, viz., (5) 
unstable emotionally and introvert; (6) unstable emotionally and 
extrovert; (7) stable emotionally and introvert and (8) stable emotion- 
ally and extrovert. 

The concentration of failures is in group 8 which has more than 
twice as many failures as any of the other groups. Throughout the 
decile ranges of psychoneurotic traits there is a consistent tendency for 
increasing failures among the extroverts as their psychoneurotic traits 
decrease. Fewest failures were among the unstable extroverts, most 
among the stable extroverts, with an even gradation between these 
extremes. 

Almost half of the failures of the Class of 1928 fell in the emotion- 
ally stable and extrovert group. Samuel J. is worth noting. He is 
still in college, but his scholarship record has been much poorer for his 
second year than it was for his first. He became affiliated with a 
scholarly fraternity which encouraged him to work his first year. 
Then they could control him as a freshman. Now they can not and 
his attitude toward his work has changed, bringing about a marked 
change in grades. 
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APPLICATIONS TO ADMINISTRATION 


The above data would suggest that where practicable the divisions 
of the various classes should be made on a basis of introversion- 
extroversion and psychoneurotic traits. In this way the instructor 
could very likely find it profitable to use two systems of presenting 
the work. 

This would have to be determined by experiment, but we believe 
these are sufficient grounds for such a course. It might even be found 
that certain professors have a knack for teaching one group and others 
for teaching the other. The Colgate tests might prove useful in 
selecting the professors. 

This is analogous to boys who “pal around” together being emo- 
tionally similar, a fact which was established in the laboratory last 
year through experiments by Thomas Foote and James Leavenworth. 

Personnel departments should also give particular attention to the 
emotionally steady extroverts and the emotionally unsteady intro- 
verts. These are the two groups in which most of the poor students are 
found. We have nothing to offer as yet as to the best method for the 
personnel officers to use. Robert Elwood is now working on this in 
the laboratory. | 
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NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
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REPORTED BY JAMES E. MENDENHALL 


Research Associate, Institute of Educational Research, Teachers College 
Columbia University 











INTELLIGENCE TESTS 


The Thorndike Intelligence Tests and Academic Grades. David Graver and 
W. T. Root. The Journal of Applied Psychology, Aug., 1827, 297-318. 500 
Freshmen in the University of Pennsylvania. The moderate correlation secured 
from testing does not justify the exclusion of students from college on the basis of 
the Thorndike rating alone. The Thorndike tests should be supplemented with 
individual tests, questionnaires and follow-up studies. 

A Psychological Comparison of Nursery School Children from Homes of Low 
and High Economic Status. Arnold Gesell and Elizabeth Evans Lord. The 
Pedagogical Seminary, Sept., 1927, 339-356. Eleven pairs of children ranging 
from 31 months to 52 months of age were tested and rated on 15 items. The 
children from homes of high economic status were superior in mental equipment, 
“in verbal, practical or emotional abilities.’’ This favored group also exhibited 
greater “spontaneity” and “‘expressiveness.”’ 

Judging the Intelligence of Boys from Their Photographs. Peter C. Gaskill, 
Morman Fenton, and James P. Porter. The Journal of Applied Psychology, 
Oct., 1927, 394-403. Twelve boys, ages 11 to 12, of IQ ranging from 18-171 on 
the Standford-Binet were photographed. 274 individuals in psychology rated 
the group: “The median correlation between these ratings and actual IQ ranks was 
+425.”” “The correlation coefficient of the median rankings by the total group of 
274 for each picture against the correct rankings is +70.” 

A Study of the Non-verbal Nature and Validity of Myers Mental Measure. 
Vernon A. Jones. Journal of Educational Research, Oct., 1927, 203-209. 327 
children from non-English speaking homes and 278 children of native American 
parentage were tested with the Myers Mental Measure and the McCall Multi- 
mental Test. The Myers test showed no tendency to eliminate language factors. 

The Terman and Thurstone Group Tests as Criteria for Predicting College Success. 
M. J. Nelson and E. C. Denny. School and Society, Oct. 15, 1927, 501-502. On 
127 cases the Terman test proves better for predicting success in Psychology I 
than does the Thurstone test. ‘In predicting probable success in other courses, 
however, both tests are rather mediocre.”’ 

Scholarship and Intelligence. J. B. Miner. The Personnel Journal, Aug., 
1927, 113-118. A study reporting the relation of intelligence to scholarship for 
the same groups. Also the relation of intelligence to elimination over a period of 
years. It suggests that first semester marks are of better predictive value than 
Army Alpha. 

Racial Differences in Speed and Accuracy. Otto Klineberg. Journal of 
Abnormal and Social Psychology, Oct-Dec., 1927, 273-277. The performance of 
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120 full-blood Indians on the Pintner Paterson series was compared with the per- 
formance of 110 white children. The results indicate a “qualitative rather than a 
quantitative difference . . . the white children were quicker, but the Indian 
children made fewer errors.’’ Differences secured on other investigations may 
be due to “language difficulties and educational differences.” 


ACHIEVEMENT TESTING 


A New Primary Word Recognition Test with Monthly Norms. W. H. Pyle. 
The Elementary School Journal, Oct., 1927, 137-139. A test in word recognition 
with norms by months was constructed on the basis of words appearing in recent 
primers. 

A Brief Educational Atiainment Scale for Clinical Use. J. E. Wallace Wallin 
and Margery Gilbert. The Pedagogical Seminary, Sept., 1927, 441-489. A com- 
posite test of reading, arithmetic, spelling, and of formal language development was 
given to 491 pupils in the first, second and third grades. The scale thus stand- 
ardized furnishes an educational rating of a deficient child ‘“‘without a due expend- 
iture of time.” 

A Report of Certain Significant Deficiencies of the Accomplishment Quotient. 
Herbert Popenoe. Journal of Educational Research, June, 1927, 40-47. The 
unreliability and unfairness (especially to the brighter pupils) of the accomplish- 
ment quotient as an index achievement tends to show its inadequacy as a class- 
room or general administrative device. 


CHARACTER AND PERSONALITY 


Testing the Knowledge of Right and Wrong. Hugh Hartshorne, Mark May and 
others. The Religious Education Association, Monograph No. 1, July, 1927. 
Six articles reporting the investigation of the Character Education Inquiry to date 
are reprinted in pamphlet form. 

The So-called ‘‘General Character” Test. Paul C. Witty and Harvey C. Leh- 
man. Psychological Review, Nov., 1927, 401-414. The writers point out some 
limitations and difficulties to be met in any attempt to measure ‘‘general character.”’ 

Drive: A Neglected Trait in the Study of the Gifted. Paul A. Witty and Harvey 
C. Lehman. Psychological Review, Sept., 1927, 364-376. Capacity and drive 
are not one and the same thing. Both must be considered. 

The Use of Group Rivalry as an Incentive. Elizabeth B. Hurlock. The 
Journal of Abnormal and Social Psychology, Oct.-Dec., 1927, 278-290. 155 
children in Grades IV and VI were: grouped and then drilled on the Courtes 
Research Tests in arithmetic (addition form) for five days. The Rivalry Group 
gained ‘‘41 per cent, over and above the practice effect as measured by the Control 
Group.”’ Girls, younger children and inferior pupils profited most from Rivalry. 
“Increase in accuracy of performance came only with the application of incentive 
(rivalry).” 


PsycHo.Locy or CHILDHOOD 


Occupational Interests of Three Year Old Children. K. M. Bauham Bridges. 
The Pedagogical Seminary, Sept., 1927, 415-423. Ten children three years of age 
and of average intelligence were observed during their free play period. 
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Extreme Versatility vs. Paucity of Play Interest. Harvey C. Lehman and Orbie 
C. Michie. The Pedagogical Seminary, June, 1927, 290-298. Using a check list 
of 200 play activities with 5000 children tended to show that versatility of play 
interests ‘‘are functions of something other than age, sex, or intelligence.”” Based 
on teacher ratings of character, they concluded that ‘“non-versatile boys possess 
greater self-control’”’ and ‘‘versatile boys possess more powerful drives.’”” Wealth 
of play activities are little related to personality traits. 

Reflexes in Early Childhood; Their Development, Variability, Evanescence, 
Inhibition, and Relation to Instincts. The British Journal of Medical Psychology, 
Vol. VII, Part I, 1927, 1-35. A genetic observational study throwing some light 
on the theory of instincts. 

Certain Behavior Responses in Early Infancy. Hyman Shalit Lippman. 
Pedagogical Seminary, Sept., 1927, 424-440. 178 infants, 4 to 18 months of age, 
were observed 384 times to determine their ability to grasp one, two and three 
objects, according to the method of Gesell. 


VOCATIONAL AND EpvUCATIONAL GUIDANCE 


Reactions of High School Pupils to High School Subjects. Jesse E. Adams. 
The School Review, May, 1927, 354-362; Tune, 1927, 417-427. A study of 4739 
schools to discover (a) reasons for failures, (0) vocational choices, (c) difficulty likes 
and dislikes of school subjects. 

Interest and Ability in Educational Guidance. Douglas Fryer. Journal of 
Educational Research, June, 1927, 27-39. ‘‘Ability as measured by school 
grades cannot be dependably predicted from the student’s estimate of his ability 
in educational subjects.” 

Vocational Interest Test. Edward K. Strong. The Educational Record, April, 
1927, 107-121. A test of interest of college students now being experimented with 
at Stanford University. 


PsycHoOLoGy OF ScHoot SvuBJECTS 


Comprehensive Units in Learning Typewriting. J. W. Barton. Psychological 
Monograph No. 164, The Psychological Review Company. This study showed 
that practice in typing meaningful or comprehensive units showed greater gain 
than practice in typing shorter units. 

Children’s Appreciation of Poems. Lynette Feasey. The British Journal of 
Psychology, July, 1927, 51-67. An analysis of children’s responses to poems as to 
preferences and types of judgments. 

Reasons Used in Solving Problems. E. F. Heedbreder. Journal of Experi- 
mental Psychology, Oct., 1927, 397-414. Forty subjects, ranging in age from 
24% years to adulthood, were tested with a number of problem situations. The 
reasons for responses given by the subjects were classified as to types and as to 
age groups. 


MISCELLANEOUS 


Some Effects of a Course in American Race Problems on the Race Prejudice of 450 
Undergraduates at the University of Pennsylvania. Donald Laird, The Journal of 
Abnormal and Social Psychology, Oct.-Dec., 1927, 235-242. Both at the begin- 
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ning and end of the course 400-500 students were asked to rate races or nation- 
alities as to “innate ability.’ No significant changes in ranking or in willingness 
to rank were exhibited in the second test. ‘‘On the other side of this sheet, write 
descriptions of the first member of the following groups that you recall: (a) Negro, 
(b) Chinese, (c) Japanese, (d) Italian, Greek, Mexican, Pole, or Jew, (e) Scandi- 
navian, German or Englishman. 

The Kindergarten and School Progress. Harry A. Greene. Chicago Schools 
Journal, Oct., 1927, 58-65. In Grades I to VI 1936 cases were measured by 
Haggerty Delta 1 and Delta 2, and by the Stanford Achievement Examination, 
Form B. Also 30,000 other cases were considered in the light of undergraduate 
experience and consequent grade progress. ‘‘The kindergarten groups were 
found to be significantly superior in all grades in general intelligence and also in 
general educational achievement. The superiority in educational achievement 
apparently exceeds the superiority of the same group in intelligence.” ‘‘The 
kindergarten groups required a somewhat shorter time to make one semester of 
school progress.” 

The Measurement of Teaching Efficiency in High School. Percival M. Symonds. 
Educational Administration and Supervision, April, 1927, 217-231. A survey of 
methods (especially rating) by which teaching may be evaluated as to method and 
product. : 

Results Obtained in a Special “ How to Study’’ Course Given to College Students. 
William T. Book. School and Society, Oct. 22, 1927, 529-534. A course in 
“How to Study” is found to improve a student’s studying abilities as evidenced by 
use of time, ability in reading (speed and comprehension) and in grade points 
earned. 

What Constitutes High School Scholarship? Marvin L. Darsie. School and 
Society, Oct. 29, 1927, 565-566. A suggestion is made that rural scholarship 
ratings be made on a percentile basis in addition to those based on average grade 
points earned. 
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A Prosiem Book 1n EpucaTIONAL PsycHOLOGY 


Sketches In and Out of School, by Goodwin B. Watson and Ralph B. 
Spence. New York: Teachers College, Columbia University, 
Published by the Authors, 1927. Pp. XXIV +286. 


Much comment regarding the inadequacy of courses in educational 
psychology has been expressed during recent years. The text- 
books commonly used in such courses have ignored the principles 
of psychology in their mode of presentation. The principles of psy- 
chology therein have been presented somewhat “‘unpsychologically.”’ 

Applicable principles of learning and understanding of behavior 
problems can only come through attacking and solving such actual 
problems, either real or described. Real situations are somewhat pre- 
cluded by such limitations as time available, the occurrence of certain 
behavior phenomena, and accessibility to the classroom. The authors 
have made good use of the described situation. A multitude of 
representative behavior situations from school and outside of school 
have been assembled. | 

This ‘“‘case study syllabus” can not be criticized on the ground 
that it is narrow inits scope. A glance at some of the section headings 
shows that an attempt has been made to apply psychology to the many 
and diverse phases of life. For example, note such topics as ‘‘ Extra 
Curricular Activities,’ ‘‘Case Studies—Problems of Emotional Con- 
ditioning,” “‘ Relation of Psychology to the Objectives of Education.” 
It is encouraging to see this movement toward helping the teacher to 
evaluate the many problems of school and extra-school life in the light 
of scientific method. 

The method of approaching a solution to the case problems pre- 
sented is worthy of attention. The book contains valuable methods 
by which the student can attack his problem. The reference lists are 
extensive and well chosen. 

By way of further criticism, it would seem as if there was a need for 
some provision by which the psychological principles might be more 
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clearly brought out from the many instances. Little opportunity is 
made for the drawing out specifically of the more comprehensive 
generalizations. 

This book should prove serviceable in both elementary and 
advanced classes in educational psychology. So far as the transfer 
of book knowledge to applied knowledge can be expected, this case book 
indeed promises well. JAMES E. MENDENHALL. 

The Lincoln School, Teachers College. 


SoME TECHNIQUES AND THEIR EDUCATIONAL MEASUREMENTS 


Interpretation of Educational Measurements, By Truman L. Kelley. 
World Book Company, 1927. Pp. XIII + 363. 


Everyone who has to deal with the interpretation of test results 
should read this book in which Professor Kelley expounds his beliefs 
and explains his practices. After introductory chapters on the 
historical aspects of mental measurement and the purposes served by 
educational tests, the author proceeds directly to the exposition 
of his technique in chapters entitled, The Measurement of Group 
Achievement, The Measurement of Individual Achievement, The 
Determination of Individual Idiosyncrasy, and Studies of Certain 
Inequalities of Development. These chapters tell in detail how test 
results are used by the author, and are amply illustrated with class 
results and individual case studies. 

There is a challenging of common techniques which will be some- 
what disconcerting to many workers in the field. The importance of 
“the ubiquitous probable error” has never before been adequately 
presented in a book on educational measurements, and will be some- 
thing of a shock to many teachers and supervisors who have come to 
look on test results as almost infallible. Even more striking is 
Kelley’s criticism of the Accomplishment Quotient and his entire 
abandonment of the use of general intelligence tests. The reviewer 
has never been entirely convinced of the value of the Accomplishment 
Quotient, but believes that the intelligence tests have proved alto- 
gether too valuable to be thrown aside. Kelley’s plan of substituting 
for the intelligence test a battery of achievement tests, with an 
involved and somewhat questionable scheme of weighting, is probably 
not justified by added dependability of results. 

There follows a chapter on Elementary Statistical Procedures, and 
a very interesting one in which the author presents data which he has 
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gathered in support of some of the principles set down previously. 
The latter is largely a series of correlation studies, and the conclusions 
represent the ubiquitous error of the statisticians in believing that 
most of our educational problems can be solved by applying correlation 
technique. A correlation coefficient is only an average, and is subject 
to the limitations of averages. There is no more reason for giving up 
general intelligence tests because the correlation is high between them 
and achievement tests than there is for neglecting to make studies of 
individual pupils in classes where the average attainment is satisfactory. 
The book concludes with a list of tests, classified by subject and 
rated by five expert judges. This list will no doubt be widely used by 
teachers and superintendents who must choose tests, but the reviewer 
is in doubt as to whether this part of the book is as valuable as it 
appears at first sight. Professor Monroe when asked to serve as one 
of the raters refused on the ground that the various tests in a given 
field have peculiar adaptabilities for various purposes, and that they 
should not be compared. There is another more important point. 
No man is competent to rate a series of tests unless he has become 
equally well acquainted with all of them through actual use. It is 
practically certain that none of the raters had this qualification. A bit 
of evidence showing the lack of acquaintance of these experts with some 
of the tests is found in the fact that two important tests are described 
in a form in which they have not been available for five years! Test 
users will do well to go to this list for ideas, but they still need to experi- 
ment with various tests to discover which are best adapted to their 
own particular needs in their own particular schools and school 
systems. Epwarp A. LINCOLN. 
Psycho-educational Clinic, Harvard University. 





SourcE PROBLEMS OF MARKING IN COLLEGES 


The Improvement of College Marking Systems, by Ralph B. Spence. 
New York City Teachers College, Columbia University, Contribu- 
tions to Education, No. 252, Bureau of Publications. 


The chaotic state of the marking system is justification enough 
for such intensive studies of methods of improvement as the present 
one. Dr. Spence suggests two lines of improvement, namely, the 
improvement of measures upon which marks are based and the 
improvement of methods of combining these grades into units which 
are more comparable and meaningful for each individual. This study 
makes a decided contribution to the second of these problems. 
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A brief resumé is given of marking systems in use and those sug- 
gested in the literature of education. An experimental attack upon 
methods of combining grades is then made. The first problem was 
to find a unit which could be handled statistically and one which would 
permit the direct comparison of classes and individuals. A standard 
deviation unit, the 7 score, was used. : 

The care with which the author proceeded is indicated by the fact 
that, although he assumed a normal distribution of scholastic ability, 
he showed that the normal curve was an excellent fit for the ability 
of these groups in terms of intelligence and in terms of the unit of 
scholastic ability used. Furthermore, after building up a comparable 
unit by correcting for as many factors as possible, the improvement 
resulting from the correction of each of these factors was checked. 

Comparisons between the distribution of corrected scholarship 
scores show significant differences between the constituency of various 
classes. Twelve per cent of the class with the lowest median score 
exceeds the median of the highest class. Also, there were significant 
differences between the average scholarship of students in courses of 
different levels and of different departments. The differences in the 
make up of classes over a period of four successive years appear to be 
small. Similar differences in variability exist when classes, depart- 
ments, and levels of courses are compared. 

The Appendix includes a percentage and a 7’ table and models to 
illustrate the method of converting letter grades, percentage grades, 
and ranks into 7’ scores. 

The chief contributions of the study are the statistical techniques 
used to weight relevant factors and develop a comparable unit and a 
proposed plan for handling college grades. The most novel suggestion 
is that the instructor be relieved of assigning values to his marks. He 
would be asked to rank the members of his class according to their 
relativeachievements. It then would be the duty of the administrative 
officers to equate the ranks and determine their administrative values. 

Such a procedure as proposed would give a meaning to marks which 
they now do not possess. So far as credit for courses is concerned, 
there would be no “‘snap’’ courses. The method, if widely adopted, 
would make possible more accurate comparisons between colleges. 
The method, likewise, is as suitable for use in large elementary and high 
schools as for colleges. The investigation and proposals warrant care- 
ful study and discussion by public school and college authorities. 

C. O. MatTuews. 
Ohio Wesleyan University. vs gifaay 
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ERRORS IN SCHOOL 


Their Causes and Treatment 


By Str Joun Apams, LL. D. 
Sometime University Professor of Education 
University of London 
Fellow and Ex-President, Educational Institute of Scotland 


WN THIs book the author describes, and exemplifies by actual classroom illustration, the 

basis of error and its correction. The study of the text will givea more intelligent 
sensitiveness to error in all its forms, and a readiness in dealing effectively with each kind 
as it arises. 

Sir John Adams is not only an authority in the educational world outside of the United 
States, but from first-hand experience is equally well acquainted with the educational view- 
point in this country. His treatment of the subject is attractive and convincing, and the 
book, both scholarly and inspiring, will help to broaden and enrich the educational back- 
ground of the reader. 

$2.50 postpaid 
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